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ment  for  use  in  an  IPUS  Radar  Clutter  Analysis  Testbed.  Though  not  as  well-developed 
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of  radar  returns.  There  has  also  been  significant  development  of  knowledge  for  weak 
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to  hypothesize  the  distribution  of  data  in  a  clutter  patch  based  on  a  small  amount  of 
data.  Also,  techniques  have  been  developed  for  partitioning  the  radar  surveillance 
volume  into  background  noise  and  clutter  patches,  for  weak  signal  detection  in 
K-distributed  clutter,  and  the  efficient  use  of  Rejection  Theorem  for  Weibull  clutter 
generation.  Though  the  effort  on  the  application  of  IPUS  to  communications  was  given 
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weak  signal  detection  in  communication  systems  subject  to  spherically  invariant 
random  processes  (SIRP)  interference. 


Abstract 


This  investigation  is  motivated  by  the  problem  of  weak  signal  detection  in 
a  strong  clutter  background.  The  concept  of  the  Locally  optimum  Detector 
has  been  used  to  address  this  problem.  The  problem  of  weak  signal  detection 
has  been  extensively  addressed  in  the  literature  when  the  received  radar 
samples  can  be  modeled  as  independent  and  identically  distributed.  However, 
this  issue  has  not  received  much  attention  when  the  received  radar  samples 
are  correlated  and  have  a  non-Gaussian  probability  density  function.  Also, 
performance  analysis  is  not  generally  carried  out  for  finite  sample  sizes. 

This  thesis  addresses  the  performance  of  Locally  Optimum  Detectors  in 
radar  weak  signal  detection  for  finite  sample  sizes  where  the  radar  distur¬ 
bance  is  modeled  as  a  correlated  non-Gaussian  random  process.  The  theory 
of  Spherically  Invariant  Random  Process  is  used  for  statistical  characteri¬ 
zation  of  non-Gaussian  radar  clutter.  In  particular,  the  K-distribution  and 
the  student-T  distributions  have  been  considered  as  models  for  radar  clutter. 

A  canonical  form  is  established  for  the  Locally  Optimum  Detector  that  is 
a  product  of  the  Gaussian  linear  receiver  and  a  zero  memory  nonlinearity. 

The  functional  form  of  the  zero  memory  nonlinearity  depends  on  the  approx¬ 
imation  used  for  the  underlying  radar  clutter  probability  density  function. 

Since  the  weak  signal  detector  is  nonlinear,  thresholds  for  specified  false  alar- 
m  probability  cannot  be  established  in  closed  form.  Given  a  specified  false 
alarm  probability  a  new  method  for  threshold  estimation  based  on  extreme 
value  theory  is  derived  that  reduces  by  orders  of  magnitude  the  computation 
and  sample  size  required  to  set  the  threshold.  Once  the  threshold  is  set  the 
performance  of  the  Locally  Optimum  Detector  is  carried  out  for  finite  sample 
sizes  through  computer  simulations.  Finally,  the  concept  of  the  Amplitude 
Dependent  Locally  Optimum  Detector  is  introduced  which  has  significantly 
improved  performance  over  the  Locally  Optimum  Detector  for  K-distributed 
clutter.  The  performance  evaluation  of  the  Amplitude  Dependent  Locally 
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Chapter  1 


Introduction 

1.1  Weak  Signal  Problem 

In  radar  applications  it  is  found  that  the  received  target  signal  is  contaminated 
with  clutter  and  thermal  noise.  The  received  signal  due  to  undesired  reflections  from 
land,  sea,  atmosphere  etc.  is  called  clutter.  The  thermal  noise,  which  is  generated 
by  the  receiver  hardware,  is  typically  modeled  as  a  Gaussian  random  process.  This 
kind  of  noise  is  always  present.  Depending  upon  the  situation,  the  clutter  may  or 
may  not  be  modeled  as  a  Gaussian  random  process.  Also,  the  power  associated  with 
the  background  clutter  may  be  orders  of  magnitude  larger  than  the  receiver  thermal 
noise  or  the  desired  signal  power. 

In  modern  radars,  temporal  and  spatial  processing  are  used  to  separate  the  target 
from  the  clutter.  For  example,  the  received  signal  from  a  target  having  a  radial 
velocity  with  respect  to  the  radar  will  experience  a  Doppler  shift.  If  the  target 
spectrum  appears  in  the  tail  of  the  clutter  spectrum,  then  conventional  frequency 
domain  techniques  can  be  used  to  extract  the  target  from  the  clutter.  Similarly,  if 
the  spatial  spectrum  of  the  target  does  not  overlap  that  of  the  clutter,  performance 
will  be  limited  by  the  background  noise  rather  than  the  clutter.  In  this  research  use  is 
also  made  of  temporal  and  spatial  processing.  However,  we  are  interested  in  the  case 
where  the  target  temporal  and  spatial  spectra  cannot  be  separated  from  the  strong 
clutter.  By  definition,  this  is  referred  to  as  the  weak  signal  detection  problem.  Given 
a  Range- Doppler- Azimuth  cell  in  which  a  target  is  to  be  detected,  it  is  assumed  that 
the  signal  is  larger  than  the  background  noise  but  much  smaller  than  the  clutter. 

In  the  weak  signal  problem  the  performance  is  limited  by  the  clutter  even  after 
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temporal  and  spatial  processing.  Therefore,  it  becomes  very  important  to  identify 
the  clutter  plus  noise  probability  density  function.  This  density  function  is  the  Nth 
order  joint  density  function  of  the  received  radar  samples  ri,r2, ... ,r ^  in  the  absence 
of  a  target  signal.  The  received  waveform  can  be  modeled  as  a  random  process.  Since 
we  will  be  sampling  this  process  at  N  time  instants,  we  need  to  know  the  Nth  order 
joint  probability  density  function  (PDF)  of  the  N  random  variables.  In  this  research 
effort  the  performance  measures  of  radar  receivers  are  analyzed,  given  the  Nth  order 
PDF  associated  with  the  random  process. 

In  the  hypothesis  testing  problem,  where  we  have  to  decide  whether  the  target  is 
present  or  absent,  two  kinds  of  errors  can  occur:  1)  A  false  alarm  which  occurs  when 
it  is  decided  that  the  target  is  present  when  it  is  not,  2)  A  miss  which  occurs  when 
it  is  decided  that  the  target  is  not  present  when  it  is.  In  many  radar  problems  the 
chosen  criterion  is  to  fix  the  probability  of  false  alarm  at  a  certain  value  and  then  to 
maximize  the  probability  of  detection.  In  statistical  decision  theory  the  Likelihood 
Ratio  Test  (LRT)  is  optimum  for  these  kinds  of  problems.  The  LRT  evaluates  the 
likelihood  ratio  which  is  the  ratio  of  the  Nth  order  joint  PDF  under  the  alternative 
hypothesis  Hi  (signal  present)  to  the  Nth  order  joint  PDF  under  the  null  hypothesis 
Ho  (signal  not  present).  This  ratio  is  then  compared  to  a  certain  threshold  to  make  a 
decision.  Under  the  constraint  of  a  fixed  false  alarm  probability,  the  Neyman-Pearson 
receiver  obtained  on  the  basis  of  the  likelihood  ratio  test  is  the  optimum  receiver. 

The  components  of  the  received  vector  r  can  be  written  mathematically  as 


Hi  :  r{  =  Si  +  d{  (1.1) 

H0:  r,-  =  di  i  =  l,2..JV  (1.2) 

where  s,-  and  di  represent  the  ith  sample  of  the  desired  signal  return  and  the  additive 
disturbance,  respectively.  Also,  let  /g(r|if0),  /c(d)  denote  the  Nth  order 

PDFs  of  R  under  Hi,  of  R  under  Ho  and  of  the  disturbance.  In  general,  the  distur¬ 
bance  may  be  composed  of  clutter  plus  noise.  Since  it  is  not  possible  to  separate  the 
clutter  and  noise  components  of  the  disturbance  when  the  disturbance  is  measured, 
we  focus  on  the  disturbance  itself.  As  the  signal  becomes  very  weak  (i.e.  as  the 
signal  to  clutter  plus  noise  ratio  (SCNR)  approaches  zero),  the  numerator  and  the 
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denominator  of  the  LRT  tend  to  become  identical.  This  is  due  to  the  fact  that 

«  fdr\Ho)  =  h{d).  (1.3) 

This  will  result  in  the  likelihood  ratio  being  approximately  equal  to  unity  independent 
of  the  received  signal.  Thus,  if  T,  denotes  the  likelihood  ratio, 

t oo  too 

PD  —  MT.\Hi)dt.*PF=  MT.\Ho)dt.  (1.4) 

Jr)  Jr) 

where  Pd  and  Pf  represent  the  detection  and  false  alarm  probabilities.  Therefore, 
the  LRT  performs  poorly  in  the  limit  as  the  signal  strength  tends  to  zero. 

Even  though  the  problem  of  weak  signal  detection  in  radar  applications  is  of  great 
interest,  most  of  the  literature  by  various  researchers  has  been  devoted  to  strong 
signals  in  a  clutter  plus  noise  background.  Optimal  and/or  very  good  sub-optimal 
schemes  have  been  proposed  to  achieve  the  desired  level  of  performance.  Only  a  rela¬ 
tively  small  fraction  of  the.  literature  is  devoted  to  the  design  of  practical  schemes  for 
the  detection  of  weak  signals.  In  this  study  we  present  a  general  theory  for  developing 
practical  detector  structures  for  weak  signal  problems.  Also,  computer  simulation- 
s  are  used  to  evaluate  performance  when  the  disturbance  can  be  approximated  by 
the  multivariate  student-T  and  K-distributions.  In  such  problems  the  concept  of  the 
Locally  Optimum  Detector  ( LOD )  is  used  to  come  up  with  the  decision  rule  which 
is  also  a  ratio  test.  For  a  deterministic  signal,  a  statistic  is  obtained  by  taking  the 
ratio  of  the  derivative  with  respect  to  the  signal  strength  of  the  Nth  order  joint  PDF 
under  H\  to  the  Nth  order  joint  PDF  under  Ho-  The  limit  of  this  ratio  as  the  signal 
strength  tends  to  zero  is  evaluated  to  obtain  the  test  statistic  for  the  decision  rule.  In 
the  random  signal  case  the  test  statistic  is  a  ratio,  in  the  limit  as  the  signal  strength 
tends  to  zero,  of  the  second  derivative  with  respect  to  the  signal  strength  of  the  Nt>l 
order  joint  PDF  under  Hi  to  the  Nth  order  joint  PDF  under  Hq.  This  approach  is 
valid  when  it  is  known  that  the  SCNR  ratio  is  very  small  but  the  actual  value  of  SC- 
NR  is  unknown.  Thus,  the  LOD  turns  out  to  be  a  Uniformly  Most  Powerful  (UMP) 
test  for  the  class  of  problems  where  the  SCNR  is  in  the  neighborhood  of  zero. 
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1.2  Non-Gaussian  Correlated  Data 

Previously,  general  analytic  expressions  for  the  various  applicable  Ntfl  order  joint 
non-Gaussian  PDFs  which  allow  for  correlation  between  the  variables  were  unavail¬ 
able.  As  a  result,  researchers  in  the  past  assumed  independence  between  the  samples. 
By  assuming  independence  between  the  samples,  they  were  able  to  get  the  Nth  order 
PDF  as  a  product  of  the  marginal  density  functions.  If  we  carry  out  the  locally  opti¬ 
mum  test  using  the  Nth  order  density  function  based  upon  independence  and  evaluate 
its  performance,  it  is  found  that  an  unreasonably  large  number  of  samples  is  needed 
for  acceptable  performance.  This  arises  because  independent  samples  imply  a  white 
spectrum.  Consequently,  space-time  processing  cannot  be  used  to  filter  the  target 
from  the  clutter.  Based  on  the  concept  of  Spherically  Invariant  Random  Processes 
(SIRP),  analytical  expressions  for  some  Nth  order  joint  non-Gaussian  PDFs  which 
allow  for  correlation  between  the  variables  are  now  available.  Use  of  the  multivariate 
expressions  for  the  non-Gaussian  PDFs  and  the  theory  of  locally  optimum  detectors 
enables  receiver  structures  for  weak  signal  detection  to  be  derived. 

1.3  Thesis  Organization 

The  literature  review  on  weak  signal  detection  and  the  derivation  of  the  locally 
optimum  detector  are  presented  in  Chapter  2.  It  is  shown  that  the  LOD  determines 
whether  a  target  is  present  or  absent  by  comparing  a  statistic  computed  from  the 
data  to  a  set  threshold.  Both  deterministic  and  random  target  signals  are  considered. 
The  receiver  structures  are  specialized  to  the  case  for  which  the  clutter  plus  noise  can 
be  approximated  by  an  SIRP. 

Since  the  clutter  is  assumed  to  be  non-Gaussian,  the  LOD  receiver  structure  turns 
out  to  be  nonlinear.  As  a  result,  system  performance  must  be  determined  by  means 
of  computer  simulation.  The  threshold  is  conventionally  determined  through  a  Monte 
Carlo  procedure.  Unfortunately,  the  number  of  trials  is  inversely  proportional  to  the 
false  alarm  probability,  Pp.  For  example,  when  Pp  =  10-6,  a  minimum  of  ten  million 
trials  need  to  be  generated.  To  avoid  carrying  out  so  many  trials,  a  new  technique, 
based  on  extreme  value  theory,  is  presented  in  Chapter  3.  It  is  demonstrated  that 
fairly  accurate  thresholds  can  be  determined  for  false  alarm  probabilities  as  small  as 
10-7  with  as  few  as  10,000-30,000  trials. 

Assuming  that  the  clutter  plus  noise  can  be  approximated  by  either  the  multivari- 
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ate  student-T  or  K-distributions,  the  LOD  is  developed  in  Chapter  4  for  the  weak 
signal  detection  problem.  The  system  performance  is  evaluated  by  means  of  computer 
simulation  for  each  distribution.  It  is  shown  that  the  performance  improvement  for 
the  LOD  is  significant  compared  to  the  linear  receiver  when  the  clutter  plus  noise  is 
approximated  by  the  student-T  distribution.  However,  the  performance  improvement 
compared  to  the  linear  receiver  is  not  quite  as  significant  when  the  clutter  plus  noise 
is  approximated  by  the  K-distribution. 

To  enhance  the  performance  of  the  LOD  for  weak  signal  detection  when  the  clutter 
plus  noise  is  approximated  as  multivariate  K-distributed,  a  new  technique  called  the 
amplitude  dependent  locally  optimum  detector  is  presented.  This  test  however,  is  not 
a  uniformly  most  powerful-test.  Based  on  this  test,  it  is  demonstrated  that  significant 
performance  improvements  can  be  obtained  compared  to  the  linear  receiver  even  when 
the  clutter  plus  noise  is  multivariate  K-distributed. 

Summary  and  conclusions  are  presented  in  Chapter  6. 
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Chapter  2 

The  Locally  Optimum  Detector 


2.1  Literature  Review 

The  concept  of  the  locally  optimum  detector  was  first  established  by  Neyman  and 
Pearson  in  1930  [1,  2].  Subsequently  this  was  applied  to  statistical  communication 
and  signal  processing  by  several  researchers. 

David  Middleton’s  work  [3,  4]  on  the  LOD  is  based  on  expanding  the  LRT  in  terms 
of  a  power  series  expansion  and  truncating  the  series  to  a  first  order  approximation. 
In  the  limit  as  the  signal  tends  to  zero,  the  canonical  structure  of  the  locally  optimum 
detector  is  established  with  very  weak  restrictions  on  the  statistical  properties  of  signal 
and  noise.  The  analysis  applies  equally  well  to  non-Gaussian  as  well  as  Gaussian  and 
non-stationary  as  well  as  stationary  processes,  for  stochastic  as  well  as  deterministic 
signals,  continuous  as  well  as  discrete  time  signals  and  for  combinations  of  signal 
and  noise  that  need  not  be  additive.  In  fact,  the  general  character  of  the  results  is 
independent  of  the  particular  nature  of  the  signal  and  noise,  although  specific  noise 
distributions  determine  the  specific  detector  structures.  Middleton  shows  that  the 
locally  optimum  detector  is  a  threshold  detector  with  very  strong  optimality  features 
in  the  limit  of  an  infinitely  large  number  of  samples.  However,  in  our  research,  we 
are  interested  in  applications  where  the  number  of  samples  may  not  be  too  large. 

David  Middleton  [5]  has  also  extended  the  problem  of  threshold  or  weak  signal 
detection  to  vector  fields  involving  highly  non-Gaussian  electromagnetic  interference 
and  signals  which  are  both  narrowband.  The  emphasis  is  on  a  canonical  formulation. 
However,  the  performance  measures  presented  are  obtained  on  the  basis  of  asymptoti¬ 
cally  locally  optimum  algorithms.  In  [6]  Middleton  has  also  analyzed  the  performance 
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of  the  locally  optimum  detectors  in  correlated  interference.  He  points  out  that  the 
correlation  function  involves  only  the  second  order  statistics  whereas  higher  order 
statistics  should  be  considered  for  the  non-Gaussian  case.  Consequently,  the  corre¬ 
lated  noise  model  leads  to  a  suboptimal  receiver  except  when  the  underlying  noise 
distribution  is  Gaussian.  But  when  the  sample  to  sample  correlation  is  strong,  the 
resulting  algorithms  and  performance  measures  can  provide  noticeable  improvements 
over  models  which  employ  independent  noise  sampling  assumptions. 

Other  researchers  in  this  area,  such  as  J.H.  Miller  and  John  Thomas  [7]  and  Saleem 
Kassam  [8],  have  obtained  performance  of  the  LOD  under  the  asymptotic  condition 
of  an  infinitely  large  number  of  samples.  These  researchers  have  modeled  the  noise 
samples  as  independent,  identically  distributed  random  variables.  This  enables  them 
to  have  a  closed  form  expression  for  the  Nth  order  PDF  of  multivariate  non-Gaussian 
noise.  Applying  the  LOD  test  they  have  arrived  at  the  decision  statistic.  Using 
the  central  limit  theorem,  the  test  statistic  is  shown  to  approach  Gaussian  in  the 
limit  of  very  large  sample  size.  Then  the  performance  measures  are  evaluated.  Song 
and  Kassam  [9],  [10]  have  also  derived  the  locally  optimum  detectors  for  both  known 
signals  and  random  signals  in  a  generalized  observation  model.  In  this  model  additive, 
multiplicative  and  signal  dependent  noise  models  are  considered.  They  show  that 
the  detectors  derived  under  this  model  are  interesting  generalizations  of  the  locally 
optimum  detectors  derived  in  the  additive  noise  model  case.  They  also  analyze  the 
performance  of  the  detector  for  a  finite  sample  size  case.  But,  the  underlying  noise 
distribution  is  assumed  to  be  bivariate  Gaussian. 

For  a  variety  of  detection  problems,  Jack  Capon  [11]  concludes  that  implementation 
of  the  LOD  is  either  less,  or  no  more  complicated  than  the  Neyman- Pearson  detec¬ 
tor.  First,  he  proposes  the  locally  optimum  detector  for  weak  signal  applications  and 
proceeds  to  evaluate  its  performance  by  comparing  it  with  the  Neyman-Pearson  de¬ 
tector.  The  comparison  is  based  on  the  concept  of  Asymptotically  Relative  Efficiency 
(ARE).  ARE  is  defined  as  the  ratio  of  sample  sizes  required  for  two  different  detectors 
to  achieve  the  same  error  probability  and  for  the  same  signal  to  noise  ratio,  as  the 
signal  to  noise  ratio  tends  to  zero  and  the  sample  sizes  tend  to  infinity.  On  the  basis 
of  this  comparison,  it  is  shown  that  the  locally  optimum  detector  is  asymptotically 
as  efficient  as  the  Neyman-Pearson  detector.  Conte,  Izzo,  Longo  and  Paura  [12]  have 
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also  considered  the  problem  of  weak  signal  detection  using  locally  optimum  detectors 
for  arbitrarily  large  sample  sizes  and  show  that  significant  improvements  are  achieved 
compared  to  the  linear  detector.  However,  when  they  implement  their  algorithm  for 
finite  sample  sizes,  they  conclude  that  the  promises  of  the  asymptotic  theory  cannot 
be  achieved  using  even  moderately  large  sample  sizes.  Hence,  they  propose  a  scheme 
which  is  a  hybrid  of  the  asymptotically  optimum  detector  and  the  linear  detector. 
Asymptotically  effective  nonparametric  algorithms  for  detecting  weak  signals  in  non- 
Gaussian  interference  have  also  been  considered  by  Valeyev  and  Aspisov  [13].  They 
conclude  that  the  effectiveness  of  such  algorithms  approaches  that  of  the  optimum 
asymptotically,  in  the  limit  of  large  sample  size.  Raveendra  and  Srinivasan  [14]  have 
derived  the  locally  optimum  receiver  structure  for  the  coherent  detection  of  contin¬ 
uous  phase  frequency  shift  keying  (CPFSK)  in  non-Gaussian  noise  channels.  They 
evaluate  the  performance  of  the  receiver  which  consists  of  a  zero  memory  nonlinearity 
followed  by  a  correlator  for  a  number  of  noise  models.  The  measure  of  performance  is 
the  asymptotic  relative  efficiency.  However,  they  point  out  two  important  drawbacks 
in  the  analysis  of  performance  through  the  ARE.  The  first  is  that  while  large  sam¬ 
ple  sizes  are  desirable  for  weak  signal  detection,  increasing  the  sample  size  actually 
makes  the  LOD  suboptimal  partly  due  to  the  fact  that  there  is  an  increased  effect  of 
higher  order  terms  in  the  expansion  of  the  likelihood  ratio.  Secondly,  under  increas¬ 
ing  sampling  rates,  the  assumption  of  independent  samples,  used  in  the  derivation  of 
the  weak  signal  detector  under  non-Gaussian  conditions  becomes  invalid.  In  a  Naval 
Underwater  Systems  Center  report,  Raymond  Ingram  and  R.  Houle  [15]  analyze  the 
performance  of  the  optimum  and  several  suboptimum  receivers  for  weak  signal  detec¬ 
tion  of  known  signals  in  additive,  white,  non-Gaussian  noise.  He  concludes  that  the 
implementation  of  the  optimum  or  suboptimum  nonlinear  receivers  yield  significant 
improvements  in  performance  relative  to  the  receiver  that  is  optimum  in  Gaussian 
noise.  However,  the  receiver  structure  is  more  complicated  than  the  linear  receiver. 

The  structure  of  locally  optimum  detectors  has  also  been  characterized  in  terms  of 
locally  optimum  estimators  and  correlators  [16].  These  characterizations  are  canonical 
structures  involving  estimators-correlators.  It  is  shown  that  if  the  one  step  signal 
predictor  is  recursive  and  the  noise  is  white  (possibly  non-Gaussian  or  nonstationary), 
there  is  a  canonical  structure  that  admits  recursive  computation.  The  motivation  to 
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get  a  recursive  structure  is  to  simplify  implementations  and  enhance  adaptability.  The 
problem  of  signal  design  has  been  considered  for  the  case  of  locally  optimum  detection 
by  Johnson  and  Orsak  [17].  They  show  for  the  weak  signal  problem  as  we  have  defined 
it  in  Chapter  1  (Introduction),  that  the  detection  performance  depends  on  signal 
energy  in  proportion  to  the  Fisher  information  for  location.  In  other  words,  when  the 
spectra  of  the  signal  and  disturbance  overlap  completely,  significant  improvements  do 
not  result  from  signal  design.  Another  useful  result  they  point  out  is  that,  among  all 
distributions  having  zero  mean  and  the  same  (finite)  variance,  the  distribution  having 
the  smallest  Fisher  information  is  the  Gaussian.  Because  of  this  result,  it  is  concluded 
that  detecting  a  small  signal  in  Gaussian  noise  is  the  most  difficult  situation  possible 
for  an  optimal  detector.  An  increase  in  the  signal  energy  yields  the  smallest  possible 
performance  improvement.  Johnson  and  Orsak  also  come  with  explicit  expressions  to 
quantify  the  “small  signal  regimes”  depending  on  the  amplitude  distribution  of  the 
noise. 

Arthur  Spaulding  [18]  compares  the  performance  of  the  Locally  Optimum  Binary 
Detector  (LOBD)  with  that  of  the  linear  receiver  and  a  hard  limiter.  The  performance 
analysis  is  done  via  specific  examples  and  through  computer  Monte  Carlo  simulation. 
Under  the  assumption  of  independent,  identically  distributed  samples,  he  concludes 
that  the  LOBD  approaches  its  optimum  performance  only  under  the  limit  of  large 
sample  sizes  and  small  signal  to  noise  ratios.  Also,  he  shows  by  way  of  an  example  that 
one  cannot  always  be  assured  of  obtaining  great  improvements  over  the  linear  receiver 
by  using  nonlinear  processing.  This  implies  that  even  if  the  underlying  probability 
density  function  of  the  interference  has  a  much  larger  tail  than  that  of  the  Gaussian, 
it  does  not  guarantee  a  much  improved  performance  over  the  linear  processor.  The 
improvement  in  performance  depends  on  the  particular  nature  of  the  underlying  PDF. 
Michael  Bouvet  [19]  obtains  the  locally  optimum  detector  by  expanding  the  likelihood 
ratio  and  truncating  the  expansion  under  the  weak  signal  assumption.  He  expands  the 
likelihood  ratio  in  two  ways:  one  with  respect  to  the  signal  and  the  other  with  respect 
to  the  observation.  He  then  establishes  the  equivalence  between  the  two  different 
forms  of  expansion.  However,  he  points  out  the  limitations  of  these  expansions  by 
stating  that  the  results  are  valid  only  if  the  neglected  terms  are  actually  negligible 
with  respect  to  the  retained  terms.  Since  the  received  observations  are  random,  this 
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cannot  always  be  guaranteed. 

Shishkov  and  Penev  [20]  have  considered  the  case  of  correlated  interference  and 
background  white  noise,  but  have  restricted  themselves  to  multivariate  Gaussian  in¬ 
terference.  For  the  known  signal  case,  when  the  underlying  noise  distribution  is 
Gaussian,  the  optimal  detector  obtained  from  the  LRT  is  the  same  as  the  weak  signal 
detector  obtained  from  the  locally  optimum  test.  Modestino  and  Ningo  [21]  were  a- 
mong  the  earliest  researchers  to  consider  weak  signal  detection  arising  from  bandpass 
processes.  They  have  modeled  the  received  signal  as  statistically  independent  complex 
samples  and  then  obtained  the  joint  density  function  of  the  inphase  and  quadrature 
components.  Under  the  assumption  that  the  clutter  density  function  is  circularly 
symmetric,  they  transform  the  joint  density  function  to  an  equivalent  one  involving 
the  envelope  and  phase.  This  model  still  does  not  include  the  correlation  from  one 
envelope  sample  to  another  and  hence  large  sample  sizes  are  required  for  good  per¬ 
formance.  Martinez,  Swaszek  and  Thomas[22],  have  considered  the  case  where  the 
noise  has  a  multivariate  Laplace  distribution,  where  any  non-negative  definite  matrix 
can  be  used  to  model  the  correlation  between  the  random  variables.  Based  on  this 
model  they  go  ahead  and  derive  the  locally  optimum  detector  which  as  expected  is 
nonlinear.  They  compare  the  performance  of  this  detector  to  the  one  developed  by 
assuming  the  noise  to  be  independent  and  identically  distributed,  and  to  the  matched 
filter.  They  compare  the  performance  of  these  receivers  through  the  means  of  ARE 
and  do  not  analyze  the  receiver  performance  for  small  sample  sizes  which  is  the  case 
of  practical  interest. 

Sangston  and  Gerlach  [23]  have  used  the  the  concept  of  the  Spherically  Invariant 
Random  Processes  to  model  multivariate  non-Gaussian  probability  density  functions. 
They  derive  the  locally  optimum  detector  based  on  this  noise  model.  It  turns  out 
that  the  locally  optimum  detector  structure  is  the  matched  filter  in  conjunction  with  a 
nonlinearity.  They  are  able  to  establish  the  canonical  nature  of  this  result  for  the  class 
of  joint  density  functions  arising  from  SIRPs.  They  propose  an  equivalent  structure 
for  the  LOD.  This  takes  the  form  of  a  receiver  in  which  the  matched  filter  output  is 
compared  to  a  nonlinear  adaptive  threshold.  However,  they  do  not  explore  the  issue 
in  terms  of  performance  analysis. 
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2.2  Spherically  Invariant  Random  Processes  (SIRP) 

In  general,  the  radar  receiver  processes  N  complex  samples  (or  2 N  quadrature 
components)  from  each  resolution  cell.  To  develop  an  optimal  receiver,  it  is  neces¬ 
sary  to  have  a  closed  form  analytical  expression  for  the  joint  PDF  of  the  received 
samples.  When  the  N  samples  are  statistically  independent,  the  joint  PDF  is  simply 
the  product  of  the  marginal  PDFs.  However,  clutter  samples  are  likely  to  be  corre¬ 
lated.  Because  this  correlation  is  useful  in  canceling  the  clutter,  it  is  important  that 
the  correlation  be  modeled.  Unfortunately,  when  the  received  samples  are  correlated 
and  non-Gaussian,  there  are  no  unique  analytical  expressions  for  their  joint  PDFs. 
A  search  of  the  mathematical  and  signal  processing  literature  reveals  that  the  SIRP 
provides  a  powerful  mechanism  for  obtaining  the  PDF  of  N  correlated  non-Gaussian 
random  variables.  SIRPs  for  clutter  modeling  and  simulation  can  be  found  in  [24], 
where  Conte  and  Longo  model  complex  clutter  as  a  spherically  invariant  random  pro¬ 
cess.  They  point  out  that  the  SIRP  is  ergodic  only  if  the  underlying  clutter  process 
is  Gaussian.  This  means  that  time  averages  cannot  be  used  to  approximate  ensem¬ 
ble  averages.  Conte,  Longo  and  Lops  [25]  also  propose  specific  computer  simulation 
procedures  for  generating  clutter  realizations  from  an  SIRP  with  desired  correlation 
properties  when  the  underlying  distribution  is  multivariate  Weibull  or  K-distributed. 
Rangaswamy,  Weiner  and  Ozturk  [26]  have  developed  a  library  of  multivariate  cor¬ 
related  non-Gaussian  PDFs  for  characterizing  various  clutter  scenarios  through  the 
theory  of  SIRPs.  A  significant  result  in  this  paper  is  the  proof  that  the  multivari¬ 
ate  SIRP  PDF  approximation  problem  can  be  reduced  to  an  equivalent  univariate 
PDF  approximation  problem.  In  [27],  Rangaswamy,  Weiner  and  Ozturk  develop  t- 
wo  canonical  computer  simulation  procedures  for  the  generation  of  any  correlated 
non-Gaussian  clutter  that  can  be  modeled  as  a  spherically  invariant  random  process. 

Application  of  the  theory  of  SIRPs  to  the  problem  of  signal  detection  and  estima¬ 
tion  can  be  found  in  [23]  and  [28].  In  [28]  Yao  derives  the  form  of  the  unit  threshold 
likelihood  ratio  receiver  for  the  detection  of  a  known  deterministic  signal  in  addi¬ 
tive  SIRP  noise.  He  shows  that  the  optimum  receiver  is  the  linear  receiver  or  the 
matched  filter  when  the  threshold  is  set  to  unity,  a  threshold  that  commonly  arises 
in  communication  systems.  This  result  is  very  significant  because  it  tells  us  that 
nonlinear  processing  will  not  improve  performance  when  the  threshold  is  set  to  unity 
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even  though  the  disturbance  is  non-Gaussian.  However,  when  the  threshold  is  not 
unity,  then  the  optimal  non-Gaussian  receiver  is  a  nonlinear  receiver.  Pentini,  Farina 
and  Zirilli  [29]  consider  the  problem  of  detecting  a  known  target  and  a  Swerling  zero 
target  embedded  in  coherent  K-distributed  clutter.  The  detectors  are  derived  based 
on  the  likelihood  ratio  test  where  the  multivariate  joint  density  functions  used  in 
the  test  are  obtained  from  the  theory  of  SIRPs.  The  receiver  performance  is  then 
evaluated  for  the  strong  signal  case.  The  false  alarm  probabilities  used  in  obtaining 
the  receiver  performance  are  10-3  and  10~4  so  as  to  reduce  the  number  of  Monte 
Carlo  trials  needed  to  set  thresholds.  For  a  Swerling  zero  target  model,  they  obtain 
a  probability  of  detection  equal  to  0.1  for  a  false  alarm  probability  of  10-3  using  four 
complex  samples  and  a  signal  to  clutter  ratio  of  —10  dB.  However,  they  do  not  explore 
performance  for  lower  values  of  signal  to  clutter  ratios. 

2.3  The  Derivation  of  the  Locally  Optimum  Detector 

The  usual  criterion  in  radar  problems  is  to  maximize  the  probability  of  detection 
under  a  fixed  false  alarm  probability  constraint.  This  receiver  is  called  the  Neyman- 
Pearson  receiver.  The  receiver  implements  the  Likelihood  Ratio  Test  (LRT)  and 
compares  it  against  a  threshold  whose  value  is  designed  to  give  the  desired  false 
alarm  probability.  In  particular,  consider  the  received  vector  RT  =  [i21?  R2, ...,  Rn]- 
Introduce  the  two  hypotheses  H0  and  Hi  as  described  below: 


H0  :  r,-  =  Ci  +  nt-  (2.1) 

H o  :  r,  =  6si  +  c,-  4-  n,-  i  =  1,2...N.  (2.2) 


Thus,  Ho  is  the  hypothesis  that  the  received  signal  consists  solely  of  clutter  plus 
noise  while  target  signal  is  assumed  to  be  present  under  the  hypothesis  Hi .  Let  the 
joint  probability  density  function  of  Ru  R2,  ...,Rn  under  hypothesis  Hk  ( k  =  0, 1)  be 
denoted  by  fii(r\Hk).  The  Neyman- Pearson  receiver  performs  the  LRT 


t*(r) 


fnW i)  H>\ 

fR(r\Ho)  V 


where  T)  is  specified  to  satisfy  the  false  alarm  constraint 


Pf=  I"  fT.{ts\HQ)dts 

Jt) 


(2.3) 


(2.4) 
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and  fx,(ts\Hk)  is  the  conditional  probability  density  function  of  the  test  statistic  Ta 
given  hypothesis  Hk. 

However,  when  the  signal  strength  is  very  small  relative  to  the  clutter  plus  noise,  the 
joint  density  function  of  the  received  random  variables  under  H\  approaches  that  un¬ 
der  Ho.  Then  the  numerator  and  the  denominator  of  the  LRT  become  approximately 
equal  leading  to  numerical  difficulties  in  discriminating  between  the  two  hypotheses. 
The  Neyman- Pearson  test  is  of  course  optimum.  However,  the  form  of  the  LRT  can 
be  rearranged  to  yield  a  test  statistic  which  is  more  sensitive  to  perturbations  in  the 
received  data.  This  gives  rise  to  the  concept  of  the  Locally  Optimum  Detector  (LOD). 
In  this  chapter  the  concept  of  the  LOD  is  developed  in  detail  using  two  approaches. 
The  first  approach  is  based  on  a  power  series  expansion  of  the  LRT  and  the  second 
approach  derives  the  LOD  by  an  optimization  using  the  principle  of  Lagrangian  mul¬ 
tipliers.  It  is  shown  that  both  approaches  yield  identical  detector  structures,  though 
starting  from  different  theoretical  points  of  view.  As  the  signal  strength  becomes 
weaker,  the  LOD  becomes  optimum  even  though  its  performance  for  a  fixed  sample 
size  may  not  be  as  good  as  desired. 

2.4  The  Series  Approach 

2.4.1  The  Known  Signal  Case 

Let  the  additive  clutter  component  C  =  [Ci,C2,  —,Cn]t  be  stationary  and  inde¬ 
pendent  of  the  stationary  white  Gaussian  background  noise  A  =  [Ni,  N2, Nn]T • 
The  noise  variance  o\  is  assumed  to  be  several  orders  of  magnitude  below  the  clut¬ 
ter  variance  cr2  which  is  taken  to  be  unity  without  loss  of  generality.  The  signal  is 
assumed  to  be  of  the  form  Os,  where  s  is  known.  The  components  of  s  are  chosen  to 
have  |s,|2  =  1  so  that  the  positive  parameter  0  is  a  measure  of  the  signal  to  clutter 
ratio  (SCR)  defined  by 

SCR  =  =  °2  •  (2.5) 

Because  the  clutter  and  noise  are  statistically  independent  with  the  noise  assumed  to 
have  zero  mean,  the  covariance  matrix  of  the  disturbance  vector  D_  —  C_  +  N_,  denoted 
by  Mq,  is  equal  to  the  covariance  matrix  of  the  clutter  Me  plus  the  covariance  matrix 
of  the  noise  M n.  Since  the  noise  is  white  and  stationary,  the  covariance  matrix  of 
the  noise  is  of  the  form  M n  =  cr2/,  where  I  is  the  identity  matrix.  When  the  clutter 
is  highly  correlated,  the  covariance  matrix  Me  tends  to  be  ill-conditioned.  However, 
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Md  will  not  be  ill-conditioned  because,  by  adding  the  small  value  a\  to  the  diagonal 
elements  of  Me,  the  smallest  eigenvalue  of  Md  is  guaranteed  to  be  no  smaller  than 
<7^.  Also,  addition  of  Mn  to  Me  ensures  that  the  disturbance  spectrum  will  limit 
performance  even  in  those  frequency  intervals  where  the  clutter  spectrum  is  negligible. 

With  this  assumption  the  LRT  takes  the  form 

,  _  /afclff.)  .  JWc-fe)  5  .... 

-  m m  ~  ~m~  (26) 

As  mentioned  previously,  when  0  1,  the  signal  OS  represents  a  small  perturbation 

in  the  received  vector  under  hypothesis  H\.  Hence,  fR.(r\H\)  approximately  equals 
/r{l\Hq).  As  a  result,  Ts  is  relatively  insensitive  to  6s.  One  approach  at  deriving  a 
weak  signal  detector  is  to  expand  the  numerator  of  the  LRT  in  a  Taylor  series. 

For  this  purpose,  let  y  =  r  —  Os.  Then 

fdr.\H,)  =  foil).  (2.7) 

Expanding  /d(^)  in  a  Taylor  series  about  the  received  vector  r,  we  obtain 


foiv)  ~  /c(r)  +  2  (y*i  -  rfc») 

*1=1 


N  N 


+  T\  H  S3  (y*i  -  rfci)(2 th  -  rk2) 


dfD(y) , 
dykl  '*=r 

^2fo(y) 


k\  =1  &2  =  1 


dykldyk2 


+  ... 

,  1  £  £  v  w  N  dnfdy)  , 

+  —y  Lj  -  2^  (y*i  -  rk1)(yk2  -  rk2)...{ykn  -  « — o— ly=r 

n!fc1=ifc2=i  fcn=i  oykldyk2...dykn  - 


+  ...  . 


This  can  be  expressed  in  vector  form  by  introducing  the  operator 

(«-  rfv,  =  g(W  -  (2.9) 

where  the  subscript  y  on  V  indicates  partial  differentiation  with  respect  to  the  com¬ 
ponents  of  y.  The  expansion  of  /p(y)  about  the  point  y  =  r  then  becomes 


foiy)  =  Id{l)  +  [(y  -  r)TVy]fD(y) |v=r 
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+  [(]/-£)  vy]  foil) \v=r 


+  fil(  y  -  r.)T^y)nfD(y)\y=t 


OO  1 

=  /Dtri  +  SrrKl-d  v»r-fc(l!)l»=i' 


(2.10) 


Recall  that  y  =  r  —  Os,  where  0  and  s  are  constants.  Note  that  y  —  r  =  —Os  and 
=  Then 

oyk  oTk 


(y  -  r)rv„  =  =  -0/V. 

k—l  * 


(2.11) 


where  the  subscript  r  on  V  indicates  partial  differentiation  with  respect  to  the  com¬ 
ponents  of  r.  It  follows  that  the  expansion  may  be  written  as 


00  (  —  1nf 

fdt  -  0*)  =  frit)  +  E  — ri(H/v.]'7D(r). 

n=l  U ‘ 


(2.12) 


In  order  for  the  above  expansion  to  be  meaningful,  it  is  necessary  that  all  the  deriva¬ 
tives  in  the  above  expansion  exist. 

Thus,  using  the  above  expansion  of  /d{l  ~  Os),  the  Taylor  series  expansion  of  the 
likelihood  ratio  about  the  received  vector  r  in  equation  (2.6)  can  be  written  as 


oo  / _ i  \nnn 

T.( l)  =  1  +  (E  VrM(*rv.)’]/afe). 
,(S  frit) 


(2.13) 


The  first  term,  being  a  constant,  can  be  combined  with  the  threshold  without  loss 
of  optimality.  The  LOD  is  obtained  by  retaining  only  the  term  corresponding  to 
n  =  1  in  the  infinite  summation.  For  0  1,  it  is  assumed  that  the  remaining  terms 

in  the  summation  are  negligible.  On  the  other  hand,  because  r  is  governed  by  a 
random  vector  and  the  partial  derivatives  of  the  PDF  evaluated  at  r  may  be  large, 
the  remaining  terms  may  actually  not  be  negligible.  However,  it  is  assumed  that  this 
occurs  with  small  probability.  The  resulting  detector  structure  can  then  be  expressed 


UrtJ 

(sTVr)frit)  »>■ 

Wd  = - —  <  1* 

where  rj^  is  chosen  so  as  to  achieve  the  desired  false  alarm  probability. 


(2.14) 
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2.4.2  The  Random  Signal  Case 

When  the  signal  is  random,  fR,(r\Hi)  is  obtained  by  integrating  the  joint  density 
function  f r,s{l-,  s\H\)  over  all  possible  values  of  s.  Hence, 

/oo  roo 

fR,s(L^\Hi)ds  =  /  fR\s=,{r\s,  Hi)fs{s)ds  =  £4[/r|s=»(t;U,  Hi)} 

-oo  J— oo  —  _ 

(2.15) 

where  E,  denotes  the  expectation  operation  carried  out  with  respect  to  the  random 
vector  S_.  Because  the  denominator  of  Ts  in  equation  (  2.6)  is  independent  of  s,  the 
Taylor  series  expansion  of  the  likelihood  ratio  can  now  be  written  as 

oo  / _ I  \nOn 

Ur.)  =  1  +  E  (2.16) 

Once  again,  as  in  the  known  signal  case,  the  unity  term  appearing  in  the  test  statistic 
can  be  put  into  the  threshold.  If  we  make  the  assumption  that  the  expected  value  of 
the  signal  vector  is  0,  then  the  n  =  1  term  in  the  infinite  series  of  equation  (  2.16) 
goes  to  zero.  Thus,  for  the  random  signal  case,  where  the  signal  vector  has  zero  mean, 
the  LOD  is  defined  to  be  the  second  term  (n  =  2)  in  the  infinite  series.  As  in  the 
deterministic  signal  case,  0  is  assumed  to  be  small  enough  such  that  the  remaining 
terms  of  the  series  are  negligible  with  high  probability.  Consequently,  the  LOD  for 
the  random  signal  case  is  given  by 

U(t)  =  ;n^£.(UTVr)2]/£(r)  >‘ (2.17) 

£Jd{L)  Ho 

where  Ts2  represents  the  second  order  term  in  the  Taylor  series  expansion  of  Ts.  The 
above  equation  can  be  rewritten  as 

TAl)  =  jfj-E.lVfs  !TV.]/£(r)  >  (2.18) 

where,  as  before,  i]  is  chosen  to  achieve  the  specified  false  alarm  probability.  Lumping 
the  constant  —■  with  the  threshold  and  recognizing  that 

Es[(sTVry]  =  Ea[VjssTVr]  =  VrTPVr,  (2.19) 


16 


where  P  is  the  covariance  matrix  of  the  signal  vector,  then  the  detector  structure  for 
the  locally  optimal  test  becomes 


Tlod(l)  = 


V^Vrfofc)]  >a 


Mr) 


<  IJu- 
»0 


(2.20) 


2.5  The  Lagrangian  approach 

Consider  again  the  hypotheses  testing  problem  defined  in  equations  (2. 1-2.2).  Let 
us  define  a  nonrandomized  decision  rule  (f>(r)  such  that 


Hi  true  (target  present) 
Ho  true  ( target  absent). 


(2.21) 


This  amounts  to  partioning  the  decision  space  into  two  regions,  S\  and  So.  A  target 
is  declared  if  the  vector  r  is  present  in  the  region  Si.  If  it  falls  in  the  region  So, 
then  the  decision  is  made  that  the  target  is  absent.  The  probability  of  detection 
equals  the  probability  that  the  nonrandomized  decision  rule  equals  unity,  given  that 
hypothesis  Hi  is  indeed  true.  This  probability  will,  in  general,  be  a  function  of  0,  the 
signal-to-clutter  ratio.  Denoting  fi(6)  as  the  probability  of  detection  we  have 


Pd  =  m  =  PlAt)  =  =  /“  (2.22) 

J —oo 


fi(0)  is  also  called  the  power  function  of  the  test.  The  false  alarm  probability  is  given 

by 

PF  =  p[4>{r)  =  1\H0)  =  ^  <t>(r)fR(r\H0)dr  =  PF.  (2.23) 

The  optimization  problem  to  be  discussed  in  the  next  section  imposes  the  constraint 
that  the  false  alarm  probability  be  equal  to  PF.  Pp  is  also  defined  to  be  the  significance 
level  of  the  test. 

2.5.1  The  Known  Signal  Case 

As  discussed  earlier,  in  the  limit  as  the  signal  strength  tends  to  zero,  the  probability 
of  detection  becomes  approximately  equal  to  the  probability  of  false  alarm.  Therefore, 
instead  of  maximizing  the  probability  of  detection,  one  approach  is  to  maximize  the 
slope  of  the  power  function  (/ 3(6 ))  curve  at  the  point  0  equal  to  zero.  The  function  to 
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be  maximized  and  the  constraint  are  given  in  the  following  two  equations.  Maximize 


am .  ,  d 


subject  to  the  constraint 


6=0  =  ^d9  S  co  1]*=° 

/  <f>(r)fE(r\H0)dr  =  a. 

J~oo 


(2.24) 


(2.25) 


We  also  require  that  the  test  be  uniformly  most  powerful  (UMP)  in  the  sense  that 

<f>(r)  be  independent  of  6  for  small  neighborhoods  in  the  vicinity  of  9  =  0.  Notice 

that  there  is  a  derivative  with  respect  to  9  outside  the  integral  in  equation  (  2.24).  If 
the  function  /h(i|#i)  is  a  well  behaved  function  of  9  such  that  its  derivative  exists 
at  all  points,  the  derivative  can  be  moved  inside  the  integral  resulting  in 

W  JZ  .Me  =  JZ  (2.26) 

Because  of  the  UMP  requirement,  =  0  and  the  first  integral  on  the  right  side  of 
equation  (2.26)  integrates  to  zero.  It  follows  that 

=  fZ  (2.27) 

Given  the  function  ^p  |g=0  to  be  maximized  along  with  the  false  alarm  probabili¬ 
ty  constraint,  the  functional  form  of  the  maximization  problem  using  the  Lagrange 
multiplier  approach  is 


maX  UL  ^)^^-dr\e=0  +  rjk[a  -  <f>(r)fR(r\H0)dr ]] 
where  rjk  is  the  Lagrange  multiplier.  Expression  (2.28)  can  be  rewritten  as 


(2.28) 


[  f  rf(r)[ 

•/  —  oo 


VkfR(r\Ho)]dr\e=0}  +  Tjka. 


(2.29) 


To  maximize  the  above  integral,  the  decision  regions  should  be  chosen  such  that  the 
integrand  is  always  positive.  In  other  words,  the  decision  regions  are  chosen  such  that 


(2.30) 
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As  was  pointed  out  in  the  previous  section,  fR(r\Hi)  is  identical  to  /d(l— &&)■  There¬ 
fore,  the  decision  rule  becomes 


dfdt-es)  § 

— We — 1*=° 


(2.31) 


The  locally  optimum  detector  is  defined  to  be  that  detector  which  implements  the 


ratio  test 


dfp(r-e$) .  H 

d$  1^=0  >  /r,  p r»\ 

- 7T7 -  <*?*•  U-o2) 

/o(r)  «o 

The  Lagrange  multiplier  77*.  is  chosen  to  satisfy  the  false  alarm  constraint.  Note  that 


/a(r  -  0‘s)  =  /d(J"i  -^5i,r2  -^2,...,rN-0s^). 


(2.33) 


As  a  result, 


5/d(e  -  Os)  _  dfD(r  -  Os)  d(rx  -  0sx)  <9/o(r  -  Os)  d(r2  -  0s2) 

dA  d(  A  \  dA  *  d /  A  \  dA 


d(rx  —  Osi)  dO  d(r2-0s2)  dO 

d/p(r  —  Ol)  djfN  -  Osn) 
d(r^  —  Osn)  dO 

_  A  dfp(r  —  6s) 

^  _  do.  \  '  'Sfc^ 


Consequently, 


-  On)  I  ^/d(i)  .  _  /  .Tr7  X  f  /  \ 

- ^ - Ifeo  -  “g  -  -U  Vr)/o(r). 


(2.34) 


(2.35) 


Thus,  the  locally  optimum  detector  can  also  be  written  as 

r  ,  ,  (sTVr)/a(r)  ». 

Wd - Tw-  ».w' 


(2.36) 


It  can  be  seen  that  this  detector  is  identical  to  the  one  in  equation  (2.14)  obtained 

through  the  series  approach. 

2.5.2  The  Random  Signal  Case 

Consider  a  random  signal  £_  and  let  its  joint  PDF  be  denoted  by  /s(s).  Also, 
without  loss  of  generality,  we  can  make  the  assumption  that  the  signal  vector  has 
zero  mean  and  that  each  component  of  the  vector  has  unit  variance.  Given  the  signal 
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vector  S.  the  joint  density  function  on  the  received  vector  under  hypothesis  Hi  is 

Hi)  =  fo(r  -  6  s). 

(2.37) 

The  power  function  for  the  locally  optimum  test  was  given  in  the  previous  section 
in  equation  (2.22).  However,  in  the  random  signal  case  the  unconditional  density 
function  /r(l\Hi)  is  obtained  by  integrating  out  the  random  vector  5  from  the  joint 

PDF  /r,s(£,  s|#i)  =  Hi)fs(s).  Use  of  equation  (2.37)  results  in 

too  roo 

0(0)  =  /  /  4>(r)fD(L  -  9s)fs(s)dr  ds. 

J  —  OO  J  —GO 

(2.38) 

The  false  alarm  constraint,  is  once  again  given  by 

[  <f>(L)fR(r\H0)dr  =  a. 

J —OO 

(2.39) 

As  before,  we  wish  to  maximize  ^p-\e=o-  If  the  function  /£>(r  —  0s)  is  a  well’behaved 

function  such  that  its  derivative  exists  at  all  points,  then 

90(0)  [°°  f°°  ,,  Os) 

do  =U-JU  SB 

(2.40) 

It  follows  from  equation  (2.35)  that 

90(6)  |  /°°  f°°  x,  dfD(r) 

m  l#=0=  LJ-J (E)[S  drt  fdi)irds. 

(2.41) 

Because  of  the  zero  mean  assumption 

TOO 

/  skfs(s)ds  =  0. 

J— OO 

(2.42) 

We  conclude  that 

90(0),  _fi 

06  1 6=0  ~° 

(2.43) 

independent  of  the  choice  of  <j>(r).  Therefore,  to  achieve  the  maximum 

increase  of 

the  power  function  in  the  vicinity  of  the  origin,  we  maximize  d  qJi'*  |g=o. 
assuming  that  the  integration  and  differentiation  can  be  interchanged, 

As  before, 

«W)  r  , 

w  =LLm  & 

(2.44) 
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However  from  equation  (2.34) 

d2/z>(r  -  6j)  3  f%(r-fo) 

36*  dOf-l  d(rk-0skV  Sk) 


36  fcti  3(rk  -  6sk) v 

__  A  32fp(r  —  6  s)  d(rj  —  6s j ) ,  . 

fr[i~ld(rj-6sj)d(rk-6sk)  36  1  k) 

=  ff  d^-M 


(2.45) 


Hence, 


32/d(l  —  ^l),  ^^d2/o(r)  _/nT,  Jn  w  /  \  ^ 

- JE5 - U=o  =  2^ Z.  flIV11  =  (vrii  Vr)/p(r).  (2.46) 


002  fritZ  drj3rk  3  K  r/J-w  v  7 

Then  the  second  derivative  of  the  power  function  at  the  origin  takes  the  form 

^l,.o  =  /”  /”  *(d(v^,sTvr)/£(£)4U)4:di  =  /“  dfeJB.Iv^/v.i/efr)  &. 

(2.47) 

Using  the  approach  of  Lagrange  multipliers  to  maximize  the  function  in  equation 
(  2.47)  along  with  the  constraint  (  2.39),  the  optimization  problem  can  be  written  as 

max[f  <j>{r)Ea[VfssTVr]fD{r)dr  +  Tiu[a-[  0(i)/d(i)^z:]]-  (2.48) 

7— oo  7 — oo 

The  above  expression  can  be  rewritten  as 

max[  [  <?Kr)[£s[V^s  lTVr]/D(r)  -  ■qufD(r)]dr]  +  T]ua.  (2.49) 

7—00 

To  maximize  the  integral  the  decision  regions  have  to  be  chosen  such  that  the  inte¬ 
grand  is  always  nonnegative.  The  resulting  decision  regions  yield  the  inequalities 


E.Jtft  sTVr]fD(r)  <  rjufD(r). 

*0 


(2.50) 


If  the  covariance  matrix  of  the  signal  vector  is  denoted  by  P,  then  the  locally  optimum 


detector  can  be  written  as 


(V*PVr)/fi( r)  «■ 

Wr)  — W) — 


(2.51) 


which  is  identical  to  equation  (2.20). 

As  a  general  rule  for  deriving  locally  optimum  tests,  note  that  we  maximize  at  the 
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origin  the  first  non-vanishing  derivative  of  the  power  function.  For  the  known  and 
the  purely  random  signal  cases  the  first  non-vanishing  derivative  is  the  first  and  the 
second  derivative,  respectively. 

2.6  Special  Cases 

In  this  section  LOD  structures  will  be  derived  for  three  special  cases.  In  the  first  it 
is  assumed  that  the  N  random  variables  in  the  disturbance  vector  D_  are  statistically 
independent.  With  this  assumption,  the  joint  PDF  of  the  N  random  variables  is  the 
product  of  the  marginal  density  functions  of  the  individual  random  variables.  In  the 
second,  the  N  random  variables  are  modeled  as  arising  from  an  SIRP.  This  model 
enables  us  to  write  the  joint  PDF  of  the  random  variables  analytically,  accounting 
for  the  correlation  between  the  random  variables.  In  the  last  case  the  N  correlated 
random  variables  are  assumed  to  be  jointly  Gaussian.  The  locally  optimum  detector 
structures  are  derived  for  all  cases.  It  turns  out  in  all  three  cases  that  the  detector 
can  be  expressed  in  a  canonical  form.  This  canonical  expression  is  derived  for  both 

the  known  and  the  random  signal  problems. 

2.6.1  The  Known  Signal  Problem 

2.6. 1.1  Disturbance  Modeled  as  Independent  Random  Variables 

From  equation  (  2.32),  the  LOD  structure  in  the  known  signal  case  is  given  as 


dj  I 

/o(l) 


e=o  § 
—  <Vk- 
H0 


(2.52) 


Let  the  N  random  variables  in  the  vector  D  be  independent  where  the  PDF  of  the 
itfl  random  variable  is  foi  (dt).  Therefore,  the  conditional  joint  density  functions  of 
the  N  received  random  variables  axe  given  by 


fRuRi . Kw(rijr2,...,rw|tf0)  =  UfoXn) 

t=l 

N 

f Ri,Ri . **(n,r2,  ...,nv|tfi)  =  JJ  fDi(ri  -  0Si). 


(2.53) 


(2.54) 


The  numerator  in  the  ratio  test  of  equation  (2.52)  is  evaluated  as 

0fjAlrd) i«-. = -mu- = <2.55) 
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Thus,  from  equation  (2.52)  the  LOD  statistic  for  independent  random  variables  is 
given  by 

TLOD(ri,r2,...,rN)  =  —  *|  (2.56) 

t= 1  JDi\ri) 

where  /©.(r,)  denotes  the  derivative  of  /©^(r,)  with  respect  to  r,-.  The  above  equation 
for  the  LOD  statistic  is  the  canonical  form  obtained  when  the  random  variables  are 
independent.  For  different  density  functions,  /©,(r,-),  the  detector  will  be  different, 
although  its  structure  remains  the  same.  The  canonical  form  of  the  detector  is  shown 
in  Fig.  2.1. 

2. 6. 1.2  Disturbance  Modeled  as  an  SIRV 

When  the  random  variables  of  the  disturbance  are  drawn  from  a  zero  mean  SIRP 
distribution,  the  joint  PDF  can  be  written  as 

fo{d)  =  2^n/2\m  |i/2  ^n(p)  (2.57) 

where  p  =  dT  M~xd,  M  is  the  covariance  matrix  for  the  N  random  variables  and 
hpj(p)  is  a  positive  valued,  nonlinear  function  of  p.  The  numerator  of  the  ratio  test 
in  equation  (2.52)  is  then  given  by 

d/g(r  ~ fg)'.  _d_r  1  .  U|  1  d 

39  'e_0  d9''2irNl2\M\ll2 39^hN^'e=0‘ 

(2.58) 

where  the  quadratic  form  p  equals  (r  -  9s)T M~x{r  —  Os)  since  d  =  r  —  6s.  From  the 
chain  rule  for  differentiation  we  have 

|(MP»  =  £(Wri)  g-  (2-59) 

From  the  expression  for  p 

^\e=o  =  -2(srM-1r).  (2.60) 

Making  use  of  equations  (2.58-2.60)  the  LOD  statistic  in  equation  (2.52)  becomes 

Tlod{l )  =  — 2(srM~1r)  (2.61) 

hN{p) 

where  hN(p)  denotes  the  derivative  of  the  function  hj^(p)  with  respect  to  the  argument 
p.  The  LOD  statistic  in  equation  (2.61)  represents  the  canonical  structure  when 
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tintuWM  (■•M  1 M  ft  ■  wTmW!]  m  MJ  lUMiull 


variables. 


the  disturbance  is  modeled  as  an  SIRV.  The  nonlinear  function  h^ip)  depends  on 

the  particular  joint  density  function  used  to  model  the  disturbance.  The  canonical 

structure  for  the  detector  is  shown  in  Fig.  2.2.  Note  that  the  detector  multiplies  the 

output  of  a  matched  filter  with  the  output  of  a  nonlinearity.  Just  as  with  a  Gaussian 

receiver,  the  matched  filter  maximizes  the  signal  to  disturbance  ratio  even  though 

the  received  signal  is  non-Gaussian  (i.e.,  derivation  of  the  matched  filter  to  maximize 

signal  to  noise  ratio  does  not  depend  on  the  Gaussian  assumption).  For  non-Gaussian 

problems,  matched  filtering  alone  is  suboptimum.  For  SIRPs  the  optimal  receiver 

requires  nonlinear  processing  as  well  as  matched  filtering. 

2.6. 1.3  Random  Variables  Arising  from  the  Gaussian  Distribution 

The  SIRP  class  of  distributions  reduces  to  the  Gaussian  distribution  when 

Hn(p)  =  e”2 .  (2.62) 


It  follows  that 

fyv(p)  _  _i 

hu(p)  2 

With  reference  to  equation  (2.61),  the  LOD  statistic  becomes 

TLod(l)  =  sTM~lr. 


(2.63) 


(2.64) 


Interestingly  enough,  this  is  identical  to  the  statistic  of  the  likelihood  ratio  test  for 
the  known  signal  Gaussian  problem  [30].  Hence,  for  the  known  signal  Gaussian  prob¬ 
lem,  the  strong  and  the  weak  signal  detectors  are  identical.  Note  that  there  is  no 
nonlinearity  involved  with  the  weak  signal  detector  for  this  case.  To  put  it  another 
way,  the  general  nonlinear  SIRP  weak  signal  detector  of  Fig.  2.2  reduces  to  the  linear 

receiver  or  matched  filter  known  to  be  optimum  for  the  Gaussian  problem. 

2.6.2  The  Random  Signal  Problem 

2. 6. 2.1  Independent  Disturbance  Random  Variables 

The  locally  optimum  detector  is  given  by  equation  (  2.51)  when  the  signal  is  ran¬ 
dom.  Repeating  equation  (2.51)  the  LOD  structure  is 


Tlod(l )  = 


(V^PVr)/p(r) 

fair) 


(2.65) 
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P  is  the  random  signal  covariance  matrix.  In  this  section  the  components  of  the  dis¬ 
turbance  vector  D  are  assumed  to  be  statistically  independent.  The  analysis  is  further 
simplified  when  the  signal  random  variables  are  also  assumed  to  be  uncorrelated.  The 
covariance  matrix  P  then  becomes  diagonal.  Let  the  diagonal  elements  of  the  matrix 
P  be  represented  by  of,  i  =  1,2,  ...,7V.  Because  the  disturbance  random  variables 
are  independent,  the  joint  density  function  /d(l)  is  again  given  by  the  product  of  the 
marginal  density  functions  of  the  individual  random  variables.  Specifically, 

/j>(d=n  .faM-  (2-6«) 

«=i 


Also,  when  P  is  diagonal, 


(2.67) 


Using  equations  (2.65-2.67)  and  following  the  same  steps  as  in  the  known  signal  case, 
the  LOD  statistic  can  be  derived  as 


Tlod{l)  =  i^cr- 


fD,{ri) 


(2.68) 


where  the  double  prime  indicates  second  derivative  with  respect  to  the  argument. 
The  canonical  structure  derived  above  is  shown  in  Fig.  2.3. 

2. 6. 2. 2  Disturbance  Random  Variables  Arising  from  an  SIRP  Distribution 

When  the  disturbance  vector  is  modeled  as  having  an  SIRP  distribution,  the  joint 
PDF  is  given  by  equation  (2.57).  The  LOD  structure  for  the  random  signal  case  is 
given  by  equation  (2.65).  Since  the  constant  terms  in  the  joint  density  function  cancel 
out  in  the  numerator  and  denominator  of  the  ratio  test  in  equation  (2.65),  the  LOD 
statistic  is  obtained  by  evaluating 


Tlod{l) 


(v  rpyr)Mp) 

Mp) 


(2.69) 


The  numerator  of  equation  (2.69)  can  be  expanded  as  a  sum  of  terms  involving  partial 
derivatives.  The  result  is  simplified  considerably  when  the  covariance  matrix  P  of  the 
signal  vector  is  diagonal.  When  P  is  chosen  to  be  the  Identity  matrix  (i.e.  P  is 
diagonal  and  the  variance  of  each  element  of  the  signal  vector  is  unity),  the  LOD 
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statistic  is  given  by 


Tlod{l )  = 


(V^V  r)Mp) 

hN(p) 


(2.70) 


The  inner  product  involving  the  V  vector  can  be  written  as 

N  82 

VTVr  =  T - • 

r  r  4-^  dr* 

t=i  u  < 


(2.71) 


Apphcation  of  equation  (2.71),  to  the  the  numerator  of  equation  (2.70)  results  in 


^d2hN(p)  ^  ,  ^d2p  w  dp  2 


(2.72) 


where  the  prime  indicates  differentiation  with  respect  to  p.  Using  equation  (2.72)  and 
dividing  by  h^{p)  the  LOD  statistic  becomes 

Wd = ik + <2J3> 


The  quadratic  form  p  can  be  written  as 


N  N 

P  =  ulri 

k= 1  /=1 


(2.74) 


where  Mkil  represents  the  kth  row  and  Ith  column  entry  of  the  matrix  M  1 .  From 
equation  (2.74)  (j£)2  and  0  can  be  calculated.  In  particular,  we  have 


dp  d  "  "  A/f_a 

7T-  =  J2  22  r*Mki  ri 

dn  dri  ^  fri 

=  +  J2r‘Mii1- 

k=  1  /=! 


(2.75) 


Because  of  the  symmetric  nature  of  the  matrices  M  and  M  1 ,  =  M,^1.  It  follows 

that  the  square  of  the  above  equation  is  then  given  by 


(hr?  =  (EnM^  +  '£rlM;y  =  (2'£rl,M^ 

°ri  k= 1  /=1  k= 1 

=  4  Z-£nMjM;'r,  =  4 £  jtnMtfM;' r, 

Jfc=l  /=1  A:= 1  i=l 


(2.76) 
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Utilizing  equation  (2.75), 


Z7  =  JoGt2ri'MH'+'ElriMZ1)  =  2MZ1. 

Uri  UT<  k=l  1=1 

With  reference  to  equations  (2.73)  and  (2.75-2.77)  define 

j'W  (r\  _  1  ry^  u’  f.A.oMElr  jw-i 

LOdU  Mdrf]  2Mp)SM"  ' 

rp(2)  (  \  _  1  *  w$Pxa_X(p)£v^ 

W£)  ■  MrtSM?)(s:)  ■4MrtSSirr‘  “  i,n 


(2.77) 


(2.78) 


hjv(p) 


(2.79) 


The  locally  optimum  detector  statistic  that  results  from  equations  (2.78-2.79)  is 
written  as 

Tlod(l)  =  T$d(l)  +  rgD(r) 

=  ^-y[2/i’\.(pj(r(A/~‘)  +  4/ij(p)£rM"1:ir1r]  (2.80) 

where  tr(M~l )  is  the  sum  of  the  all  the  diagonal  elements  of  the  matrix  M-1.  The 
canonical  structure  of  the  receiver  is  shown  in  Fig.  2.4. 

2. 6. 2. 3  Disturbance  Random  Variables  Arising  from  the  Gaussian  Distribution 
As  pointed  out  in  section  2. 6. 1.3,  the  SIRP  distribution  reduces  to  the  Gaussian 


distribution  when 


hN(p)  =  e  §. 


(2.81) 


From  the  above  equation  it  follows  that 


hN(p) 

hN(p) 

=  1 
hN(p)  4' 


(2.82) 


Consequently  equation  (2.78)  reduces  to 


^lod(h)  —  —tr(M  *) 


(2.83) 
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Figure  2.4:  Canonical  form  of  LOD  assuming  random  signal  and  random  disturbance 
arising  from  an  SIRP. 
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whereas  equation  (2.79)  becomes 


Tlod(l)  =  rT  M~1M~1r.  (2.84) 

Note  that  TiQD(r )  is  a  constant  that  can  be  combined  with  the  threshold.  As  a  result, 
the  LOD  statistic  for  the  random  signal  Gaussian  problem  is  given  by 

TLod(l)  =  rTM-1M-1r.  (2.85) 

Unlike  the  known  signal  Gaussian  problem,  the  weak  signal  LOD  statistic  does  not 
equal  the  statistic  of  the  likelihood  ratio  for  the  random  signal  Gaussian  problem  [30] 
which  is 

TLr{l)  =  rT[M-1  —  (M  +  P)_1]r.  (2.86) 

The  two  statistics  become  equivalent  only  when 

M  =  P  =  I.  (2.87) 

Although  the  strong  and  weak  signal  detectors  have  different  test  statistics,  the  re¬ 
ceiver  structures  are  both  quadratic  in  nature. 
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Chapter  3 

Determining  Thresholds  for  the 
Locally  Optimum  Detector 


3.1  Literature  Review 

In  this  dissertation,  multivariate  density  functions  for  modeling  non-Gaussian  prob¬ 
ability  density  functions  are  assumed  to  be  known.  They  are  obtained  using  the  theory 
of  spherically  invariant  random  processes.  Once  the  multivariate  density  functions 
are  known,  we  derive  a  decision  rule  using  the  theory  of  locally  optimum  detectors, 
that  is  applicable  when  the  signal  to  be  detected  is  weak  compared  to  the  additive 
disturbance.  The  procedure  for  obtaining  the  decision  rule  is  explained  in  detail  in 
Chapter  2. 

However,  the  detector  that  is  obtained  on  the  basis  of  the  theory  of  locally  opti¬ 
mum  detectors  is  typically  nonlinear  as  the  underlying  processes  are  non-Gaussian. 
When  the  test  statistic  is  nonlinear,  it  is  not  possible  to  evaluate  the  performance  of 
the  detector  analytically.  Consequently,  we  have  to  resort  to  computer  simulations 
to  analyze  the  performance.  There  are  two  steps  involved  in  computer  simulations  to 
analyze  performance.  The  first  step  is  to  evaluate  the  threshold  so  as  to  obtain  the  de¬ 
sired  false  alarm  probability.  The  second  step  is  to  evaluate  the  detection  probability 

once  the  threshold  is  set,  corresponding  to  the  desired  false  alarm  probability. 

3.1.1  Classical  methods  for  evaluating  thresholds 

Monte  Carlo  methods  have  typically  been  used  for  this  purpose.  A  large  number 
of  trials  M  are  generated  under  the  hypothesis  that  the  received  signal  consists  of  the 
disturbance  alone.  The  output  of  the  detector,  TSi  i  =  1,2,  ...,M  corresponding  to 
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the  generated  disturbance  vectors  are  recorded.  Based  on  the  output  of  the  detector, 
thresholds  can  be  set  to  obtain  the  desired  false  alarm  probability.  But,  in  order 
to  establish  the  threshold  for  a  specified  Pp,  it  is  necessary  to  accurately  know  the 
behavior  of  the  tail  of  the  test  statistic.  Unfortunately,  the  number  of  trials  required 
for  the  Monte  Carlo  technique  is  very  large,  as  is  evident  from  the  rule  of  thumb 

M  >  (3.1) 

Hence,  if  Pp  —  10-5,  one  million  trials  should  be  generated.  Clearly  this  is  not  a  very 
desirable  situation.  Thus,  for  a  reasonable  sample  size  M,  estimation  of  thresholds 
corresponding  to  small  false  alarm  probabilities  cannot  be  made  when  these  methods 
are  used.  For  Monte  Carlo  simulations  the  construction  of  approximate  confidence 
intervals  for  the  threshold  estimates  based  on  various  estimators  are  discussed  by 
Hosking  and  Wallis  [31]. 

Some  other  approaches  which  do  not  make  use  of  raw  data  or  their  smoothed 
versions  have  been  suggested  by  various  authors.  For  example,  Harrel  and  Davis 
[32]  suggested  using  linear  combinations  of  sample  order  statistics.  Their  approach 
appears  to  be  applicable  for  estimation  of  thresholds  in  the  central  region  of  the  distri¬ 
bution.  However,  the  underlying  threshold  estimator  is  not  in  a  simple  computational 
form  and  does  not  offer  any  additional  advantage  over  the  Monte  Carlo  method  in 
terms  of  threshold  estimation  corresponding  to  small  false  alarm  probabilities. 

It  has  recently  been  shown  [33]  that  the  PDFs  of  the  test  statistic  can  be  determined 
experimentally  using  a  relatively  small  number  of  samples  (eg:  50-100  samples  give 
accurate  fits  depending  on  the  distribution).  Because  the  number  of  samples  required 
by  Ozturk’s  technique  is  small,  it  is  unlikely  that  samples  will  be  from  the  extreme 
tails  of  the  PDFs.  Consequently,  the  accurate  fit  mentioned  above  applies  to  the  main 
body  of  the  density  function. 

A  number  of  statisticians  have  developed  methods  for  estimating  the  extreme  tail 
of  the  distributions  using  the  asymptotic  properties  of  extreme  order  statistics.  As¬ 
suming  an  unknown  probability  density  function  fx{x),  then  for  large  X  Hill  [34] 
proposed  using  A(x)  =  1 cx~a  as  a  limiting  form  of  the  distribution  function  to 
infer  the  tail  behavior.  A  similar  approach  is  also  given  by  Pickands  [35].  Weiss- 
man  [36]  proposed  a  different  approach  based  on  the  joint  limiting  distribution  of  the 
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largest  k  order  statistics.  His  approach  is  based  on  the  fact  that  “the  largest  k  order 
statistics  have  a  joint  distribution  that  is  approximately  the  same  as  the  largest  k 
order  statistics  from  a  location  and  scale  exponential  distribution  ”.  Weissman  ob¬ 
tained  a  simple  expression  for  the  estimate  of  the  thresholds  corresponding  to  various 
false  alarm  probabilities  of  the  distributions.  Based  on  the  empirical  comparisons, 
Boos  [37]  reported  that  Weissman ’s  estimators  have  lower  mean  squared  error  than 
those  of  standard  methods  when  the  tails  are  exactly  exponential.  When  the  tails 
are  not  exactly  exponential  the  estimators  become  highly  biased.  The  mean  squared 
errors  of  the  estimators  strongly  depend  on  the  choice  for  k.  Although  the  method 
is  non-parametric  in  nature,  the  optimal  choice  of  k  requires  the  knowledge  of  the 
parent  distribution. 

The  use  of  stable  distributions  to  model  data  having  large  tailed  distributions  has 
attracted  considerable  attention  [38],  [39], [40],  and  [41].  The  independent  and  iden¬ 
tically  distributed  random  variables  Yi,Y2,  ...Yn  are  said  to  have  a  stable  distribution 
if  Yi  -f  Y2  +  ...  +  Yn  has  the  same  distribution  as  the  individual  random  variables. 
With  the  introduction  of  additional  parameters,  control  of  the  mean,  variance  and 
the  skewness  of  the  distribution  is  possible.  A  major  difficulty  with  stable  distribu¬ 
tions  is  that  they  usually  cannot  be  expressed  in  closed  form.  Also,  estimation  of 
parameters  is  not  computationally  easy  [42]. 

3.2  Extreme  value  theory 

Guida  et  al.  [43]  compared  the  performance  analysis  of  some  extrapolative  estima¬ 
tors  of  probability  tails  with  application  to  radar  systems  design.  They  show  that  the 
estimates  based  on  the  extreme  value  theory  yield  clearly  superior  accuracy,  while 
achieving  a  substantial  savings  in  sample  size  compared  to  the  classical  Monte  Carlo 
techniques.  However,  their  method  suffers  from  two  major  drawbacks.  First,  they 
assumed  that  the  underlying  unknown  distribution  is  always  exponential  in  nature. 
This  assumption  can  be  restrictive  in  certain  situations.  The  other  drawback  is  that 
the  samples  are  partitioned  into  many  smaller  sets  of  samples  and  the  maximum  from 
each  set  is  drawn  for  estimation  purposes.  They  provide  no  optimum  rule  for  deter¬ 
mining  the  number  of  sets  to  be  used  in  partitioning  the  original  sample  even  though 
the  accuracy  of  the  estimation  depends  strongly  on  the  original  sample  size  and  the 
number  of  sets. 
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Pickands  [35]  first  suggested  that  the  Generalized  Pareto  distribution  can  be  used 
to  model  the  extreme  tails  of  probability  density  functions.  The  Generalized  Pareto 
Distribution  (GPD)  is  a  two  parameter  distribution,  with  a  scale  and  a  shape  param¬ 
eter.  Modeling  the  extreme  tail  then  corresponds  to  estimating  the  two  parameters 
of  the  GPD.  The  estimation  methods  for  the  Generalized  Pareto  Distribution  have 
been  reviewed  by  Hosking  and  Wallis  [31].  They  considered  the  method  of  moments, 
probability  weighted  moments  and  the  maximum  likelihood  method  for  estimating 
the  parameters  and  the  thresholds.  Based  on  computer  simulation  experiments,  they 
showed  that  the  probability  weighted  moment  method  is  more  reliable  than  the  max¬ 
imum  likelihood  method  for  sample  sizes  less  than  500. 

3.3  The  Radar  Problem 

The  hypothesis  testing  problem  for  deciding  whether  or  not  a  target  is  present  is 
given  by  equations  (2. 1-2.2)  in  Chapter  2.  For  weak  signal  applications,  it  was  shown 
that  the  Locally  Optimum  Detector  is  useful  for  arriving  at  a  decision  rule.  For  the 
known  signal  case,  the  LOD  structure  is  given  by  equation  (2.32).  Since  the  test 
statistic  is  typically  a  nonlinear  function  when  /d(z l\Ho)  and  fD_(x\H\)  are  multivariate 
non-Gaussian  density  functions,  it  is  not  possible,  in  general,  to  analytically  evaluate 
in  closed  form  the  threshold  77  for  a  specified  false  alarm  probability.  Given  the 
probability  density  functions  (PDF)  of  the  test  statistic,  Ts ,  under  hypotheses  Hi 
and  H0,  the  detection  and  false  alarm  probabilities  are 

Pd=  r  fxMHi)dta  (3.2) 

J  *? 

Pf=  fww  (3.3) 

Jr) 

Pd  and  Pp  are  represented  by  the  shaded  areas  shown  in  Fig.  3.1.  As  indicated  in 
the  figure  Pp  is  typically  much  smaller  than  Pp>- 

In  practice,  the  density  function  of  Ta  is  not  known  in  advance.  For  example,  de¬ 
pending  upon  various  conditions  such  as  terrain,  weather  etc.,  the  clutter  may  be  best 
modeled  by  Gaussian,  K-distributed,  Weibull  or  some  other  probability  distribution. 
In  this  Chapter  a  new  approach  is  developed  for  experimentally  determining  the  ex¬ 
treme  tail  of  fT.(ts\Ho),  where  the  number  of  samples  required  is  several  orders  of 
magnitude  smaller  than  that  suggested  by  equation  (3.1).  Once  the  tail  of  /r#(f,|J/o) 
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Figure  3.1:  Shaded  areas  indicating  Pf  and  Pd- 
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has  been  estimated,  the  threshold  can  be  determined  by  use  of  equation  (3.3). 

3.4  Methods  for  Estimating  Thresholds 

3.4.1  Estimates  Based  on  Raw  Data 

In  this  section  we  consider  some  commonly  used  threshold  estimates.  These  esti¬ 
mates  are  called  raw  estimates  and  are  already  included  in  some  statistical  package 
programs  (eg:  the  UNIVARIATE  procedure  in  the  SAS  [44]  package). 

Let  X\  <  X<i  <  ...  <  Xn  denote  the  sample  order  statistics  from  a  distribution 
function  F(x).  Let  p  denote  the  desired  false  alarm  probability.  Also,  let  n(l  —  p)  — 
j  +  g  where  j  is  the  integer  part  of  n(l  —  p).  We  denote  the  threshold  estimate  based 
on  the  kth  procedure  to  be  described  below  by  rj^K  Four  different  threshold  estimates 
are  given  as  follows: 

—  (1  —  g)Xj  +  gXj+i  (3.4) 

gj?'  =  Xk,  where  k  is  the  integer  part  of  [n(l  —  a)  +  1/2]  (3.5) 

n ;i3)  =  (1  —  S)Xj  +  SXj+i,6  =  0  i/ ^  =  0;  £  =  1  if  g  >  0  (3.6) 

V{a4)  =  6Xj+1+(l-6)(X:+Xj+1)/2,  6  =  0ifg  =  0;6=Ufg>0.(3.7) 

It  is  known  that  all  of  the  above  methods  are  asymptotically  equivalent.  Thus,  if  a 
large  sample  size  is  used  (where  for  example  Mis  determined  from  equation  (3.1)), 
the  choice  of  the  best  method  is  no  longer  critical.  However,  in  an  empirical  study 
[37],  it  has  been  shown  that  gjf)  outperformed  the  other  estimators  when  g  =  0.  It  is 
noted  that  the  methods  based  on  the  above  estimators  are  restricted  by  the  condition 
that  1  <  n(l  —  a)  <  n  —  1.  This  implies  that  the  smallest  value  of  the  false  alarm 
probability  a  cannot  be  lower  than  1/n.  Consequently,  the  threshold  corresponding 
to  the  smallest  false  alarm  probability  which  can  be  estimated  by  these  procedures 
depends  on  the  sample  size.  Thus,  for  a  reasonable  size  of  n,  estimation  of  thresholds 

for  very  small  false  alarm  probabilities  cannot  be  made  when  these  methods  are  used. 

3.4.2  Estimates  Motivated  by  the  Extreme  Value  Theory 

Extreme  value  distributions  are  obtained  as  limiting  distributions  of  largest  (or 
smallest)  values  of  sample  order  statistics.  Assuming  independent  trials,  if  X\  < 
X2  <  ...  <  Xn  are  order  statistics  from  a  common  distribution  function  F(x),  then 
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the  cumulative  distribution  function  of  the  largest  order  statistic  is  given  by 


Gn(x)  =  P(Xn  <x)  =  [F(x)}\  (3.8) 

It  is  clear,  as  n  — >  oo,  that  the  limiting  value  of  Gn(x)  approaches  zero  if  F(x)  is 
less  than  1  and  unity  if  F(x)  is  equal  to  1  for  a  specified  value  of  x.  A  standardized 
limiting  distribution  of  Xn  may  be  obtained  by  introducing  the  linear  transformation, 
anXn  +  6n,  where  an  and  bn  are  finite  constants  depending  on  the  sample  size  n. 

In  Appendix  A,  using  the  theory  of  limiting  distributions  [45],  it  is  shown  that  if 
there  exist  sequences  an  and  6n  such  that 

V  _  k 

lim  P(— - -  <  x)  =  lim  Fn(anx  +  bn)  =  lim  Gn(anx  +  bn)  — ►  A(x)  (3.9) 

n— ►oo  a  n— ►  oo  n— >oo  N  \  /  \  / 
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then  the  solution  of  equation  (3.9)  yields  all  the  possible  forms  for  the  distribution 
function  Gn(x)  in  the  limit  as  n  — *  oo.  The  solutions  to  the  above  equation  are 
derived  in  Appendix  A  and  are  rewritten  here: 


A(x)  =  exp(— e~x)  x  >  0 

(3.10) 

A(x)  =  exp(—x~k)  x  >  0,  k  >  0 

(3.11) 

A(x)  =  exp(— (— x)k)  x  <  0,  k  >  0. 

(3.12) 

In  the  limit,  as  n  gets  large,  these  are  the  three  types  of  distribution  functions  to  which 
the  largest  order  statistic  drawn  from  almost  any  smooth  and  continuous  distribu¬ 
tion  function  converge.  By  differentiating  the  three  functions,  we  obtain  analytical 
expressions  for  the  limiting  forms  of  the  probability  density  functions.  However,  be¬ 
cause  of  the  differentiation,  it  should  be  recognized  that  these  expressions  may  not 
be  good  approximations  to  the  density  functions.  In  practice,  extreme  value  theory 
should  always  be  applied  to  a  distribution  function,  or  equivalently,  the  area  under 
the  density  function.  For  x  >  0,  differentiation  of  equations  (3.10)  and  (3.11)  result 
in 

1.  *  H(x)  =  e~*  (3.13) 

ax 

2.  ^&kH(x)  =  kx~(k+1)  k>  0.  (3.14) 

ax 
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The  first  equation  above  is  the  well  known  exponential  distribution  and  the  second 
equation  is  related  to  the  Pareto  distribution.  The  details  that  lead  to  the  these 
analytical  expressions  are  shown  in  Appendix  A. 

It  remains  to  be  explained  how  the  distribution  of  the  largest  order  statistic  is 
related  to  the  tails  of  the  underlying  PDF  from  which  the  samples  are  drawn.  The 
relationship  is  based  on  the  observation  that  inferences  from  short  sequences  are  likely 
to  be  unreliable.  In  particular,  instead  of  observing  k  sets  of  n  samples  and  taking 
the  largest  order  statistic  from  each  of  the  k  sets,  it  is  better  to  observe  a  single 
set  of  nk  samples  and  use  the  largest  k  samples  from  this  set  [46],  The  k  largest 
order  statistics  from  a  vector  of  nk  observations  constitute  the  tail  of  the  underlying 
distribution  especially  when  n  is  very  large.  Therefore,  the  limiting  distribution  of 
the  largest  order  statistic  closely  approximates  the  tail  of  the  underlying  PDF  for 
large  n. 

3.5  The  Generalized  Pareto  Distribution 

The  Generalized  Pareto  Distribution  (GPD)  is  defined  for  x  >  0  by  the  distribution 
function 

G(x)  —  1  —  (1  +  7i/cr)-1^7,  — oo  <  7  <  oo,  cr  >  0,71  > —cr.  (3.15) 

This  distribution  has  a  simple  closed  form  and  includes  a  range  of  distributions  de¬ 
pending  upon  the  choice  of  7  and  a.  For  example,  the  exponential  distribution  results 
for  7  =  0  and  the  uniform  distribution  is  obtained  when  7  =  —  1.  The  GPD  defined 
in  equation  (3.15)  is  valid  for  all  x  >  0  while  equations  (3.13)  and  (3.14)  are  valid 
only  for  large  x. 

The  probability  density  function  corresponding  to  the  GPD  is  given  by 

*(*)  =  - (1 + t  )~lh]  =  ;(1  +  (3,16) 

If  we  let  7  — *  0  in  the  above  equation,  note  that 

lim  1(1  + If  )"^-1  =  Ic-I.  (3. 

*  7— ►O  <7 x  7  <7  v 
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Also,  if  we  let  x  be  large  in  equation  (3.16),  note  that 


1  1  ,7  >  _i  _i_i 

-<1  +  — )  -r  1  «  -(-)  -r  x  i  \ 
a  a  a  a 


(3.18) 


Equations  (3.17)  and  (3.18)  are  of  the  same  form  as  equations  (3.13)  and  (3.14). 
Thus,  the  GPD  can  be  used  to  approximate  both  types  of  tail  behavior  exhibited  by 
the  right  tail.  Typical  plots  of  the  Generalized  Pareto  PDF  for  7  <  0  and  7  >  0  axe 
shown  in  Figures  3.2  &  3.3. 

We  wish  to  set  thresholds  for  specified  false  alarm  probabilities  when  the  underlying 
density  functions  are  unknown.  To  set  very  small  false  alarm  probabilities,  the  tail 
of  the  PDF  fjf(ta\Ho)  has  to  be  accurately  modeled.  Figure  3.4  represents  a  typical 
PDF  of  the  test  statistic  with  the  tail  region  of  the  PDF  being  defined  as  that  to 
the  right  of  t  =  t0.  Figure  3.5  shows  the  tail  translated  to  the  origin.  The  choice  for 
t0  is  somewhat  arbitrary.  For  example,  t0  can  be  chosen  such  that  the  area  in  the 
shaded  region  equals  0.1,  0.05  or  0.01.  It  is  the  portion  of  the  PDF  to  the  right  of 
t0  that  we  are  interested  in  modeling  by  the  GPD.  In  particular,  the  tail  region  of 
the  PDF  is  translated  to  the  origin  and  modeled  as  a  GPD.  Once  the  estimates  of  a 
and  7  have  been  obtained,  the  GPD  is  multiplied  by  the  area  of  the  shaded  region 
and  translated  back  to  the  point  t0.  In  this  way,  the  area  under  the  PDF  of  the  test 
statistic  is  maintained  at  unity. 

3.5.1  Methods  for  Estimating  the  Parameters  of  the  GPD 

Suppose  that  the  sample  ordered  statistics  X\  <  Xi  <  ...  <  Xn  are  drawn  from 
the  distribution  function  F(x).  To  estimate  the  right  tail  of  this  distribution  it  is 
necessary  to  determine  a  value  (say  xo)  and  then  use  those  sample  observations  which 
are  greater  than  Xo  to  obtain  the  quantity  z  =  x  —  x 0.  Once  the  tail  observations  have 
been  chosen,  the  Generalized  Pareto  Distribution  can  be  fitted  to  these  observations 
by  using  standard  methods  of  parameter  estimation.  Observe  that  the  portion  of  the 
observations  used  from  a  complete  set  of  samples  depends  on  the  choice  of  x0.  One 
approach  to  selecting  x0  is  to  make  a  histogram  of  the  data  set  and  choose  x0  to  be 
near  the  point  of  inflection  of  the  histogram.  DuMouchel  [47]  proposed  choosing  x0 
to  be  the  value  such  that  fx(x)dx  =  0.1.  Such  an  approach  is  less  subjective  and 
appears  to  be  satisfactory  for  many  applications.  However,  it  is  noted  by  DuMouchel 
that  “  using  an  even  smaller  fraction  of  observations  would  restrict  profitable  use  of 
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PT(t/Ho) 
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the  statistic  to  much  larger  sizes.  On  the  other  hand,  to  use  more  than  the  upper  one 
tenth  of  a  sample  would  seem  to  allow  too  much  dependence  on  the  central  part  of 
the  distribution.”  In  other  words,  if  a  smaller  fraction  is  used,  we  need  larger  sample 
sizes  to  get  an  adequate  number  of  samples  for  estimation  and  if  a  larger  fraction  is 
used,  the  body  of  the  distribution  may  influence  estimation  of  the  tail. 

Let  xq  be  such  that  1  —  F(x o)  =  fx(x)dx  =  A.  The  distribution  function  to  be 
used  in  approximating  the  tail  can  be  written  as 

F(x)  =  (1  -  A)  +  XG(x  -  x0)  =  1  -  A[1  +  l(x  -  x0)}~^  x  >  x0  (3.19) 

where  G(x)  is  given  in  equation  (3.15).  Assuming  that  the  tail  of  a  given  distribution 
can  be  adequately  approximated  by  equation  (3.19),  then  the  estimation  problem  of 
the  distribution  in  the  tail  region  is  reduced  to  estimation  of  the  parameters  of  the 
Generalized  Pareto  distribution. 

In  this  chapter  we  consider  three  methods  for  the  parameter  estimation  of  the  Gen¬ 
eralized  Pareto  distribution.  The  three  methods  are  maximum  likelihood  estimation, 
the  method  of  probability  weighted  moments,  and  the  ordered  sample  least  squares 
approach.  The  first  two  methods,  applied  to  the  GPD,  are  discussed  by  Hosking  and 
Wallis  [31].  The  ordered  sample  least  squares  approach  is  a  new  technique  developed 
in  this  work.  The  performance  of  the  three  estimation  procedures  are  compared  on 

the  basis  of  estimation  bias  and  mean  square  error. 

3. 5. 1.1  Maximum  Likelihood  Estimation 

The  probability  density  function  corresponding  to  the  GPD  from  equation  (3.16), 

with  x  replaced  by  z,  is 

g(z)  =  -(  l  +  (3.20) 

( 7  <J 

Given  a  sample  vector  [zi,z2,...,zm]  from  the  GPD  the  joint  density  function  Lz(z) 
of  the  m  samples,  assuming  independence,  is  given  by 

Lz{z)  =  —  fit1  +  (3-21) 

°  t=i  Q 

To  theoretically  obtain  the  maximum  likelihood  estimates  of  a  and  7,  the  logarithm 
of  the  joint  density  function  in  equation  (3.21)  is  differentiated  with  respect  to  a  and 
7,  respectively,  and  the  derivatives  are  set  to  zero.  Let  the  m  largest  observations 
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from  the  unknown  distribution  whose  tail  is  being  modeled  by  the  GPD  be  placed  in 
the  vector  [xn_m+i,  xn_m+2,  ...,x„].  Translation  of  the  tail  region  to  the  origin  results 
in  the  vector  [xn_m+i  -  x0,  Zn-m+2  -  *0, xn  -  *0]  =  [*i,  *2,  •,  zm}.  Letting  r  =  7/(7 
in  equation  (3.21)  and  differentiating  the  logarithm  of  the  joint  density  function  with 
respect  to  a  we  get 

A  A  m 

~Jal°9  L-^  r  l°9  +  ^  +  (r<T)-1)  £ l°9(l  +  T**')] 

Ttl  I  m 

=  —  +  (1  -  — )  £  log(  1  +  7  Zifa).  (3.22) 

a  <t*t  i=1 

By  setting  equation  (3.22)  to  zero  and  solving  for  a  we  obtain 

m 

<7(T)  =  H  log(  1  +  TZi)/(mr).  (3.23) 

t=i 

The  expression  for  cr  is  now  substituted  into  equation  (3.22),  so  as  to  obtain  a  function 
of  r  alone,  f  is  derived  by  differentiating  the  expression 
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m  log  cr(r)  +  (1  +  1  /(ct(t)t))  ^  log(  1  +  rz.)  (3.24) 

t=i 

with  respect  to  r  and  setting  the  derivative  equal  to  zero  with  the  constraint  that 
TZi  >  —  1.  However,  the  differentiation  leads  to  a  nonlinear  equation  whose  analytical 
solution  is  not  known.  This  difficulty  is  circumvented  by  minimizing  equation  (3.24) 
numerically  with  respect  to  r.  The  numerical  minimization  was  performed  using  the 
Nelder-Mead  algorithm  [48].  Once  the  estimate  for  r  has  been  obtained,  then  b  is 

obtained  from  equation  (3.23)  and  7  is  estimated  by  7  =  br . 

3. 5. 1.2  Probability  Weighted  Moments 

The  probability  weighted  moments  of  a  continuous  random  variable  Z  with  distri¬ 
bution  function  G  are  the  quantities 

=  E{Z’G'(Z)(  1  -  G(Z))']  (3.25) 

where  E  is  the  expectation  operator  and  p,  r  and  s  are  real  numbers.  For  the  GPD  it 
is  convenient  to  choose  p  =  1  and  r  =  0,  respectively.  Then  the  probability  weighted 
moments  are 

Mi  ft, 3  =  E[Z(  1  -  G(Z))3).  (3.26) 
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For  the  GPD  there  are  two  parameters  to  be  estimated,  a  and  7.  Substituting  s  =  0 
in  equation  (3.26),  we  get 


=  Ml, 0,0  =  E[Z]  =  r  -(1  + 

Jo  a  a 


Letting  1  +  ^  =  Y,  equation  (3.27)  results  in 


=  -2 1  (Y-l)Y---ldY 

-  jLri±l_rlp 

"  7sl-i  +  l  -*1' 


1  -7' 


Letting  s  =  1  in  equation  (3.26)  we  obtain 


ei  =  Mia, i  =  E[Z{  1  -  G(Z)\  =  f  -(1  +  —  )"’(1  +  —)~'~ld,Z.  (3.29) 

Jo  O’  O  O 


Letting  1  +  ^  =  Y,  as  before,  equation  (3.29)  results  in 


-  IlIll.Eif 

“  /y2  1 _ 2.  +  1  _2  Jl 

’  7  '  7 


2(2-7)' 


(3.27) 


(3.28) 


(3.30) 


The  values  of  e0  and  ej  are  obtained  from  equations  (3.28)  and  (3.30),  respectively, 
for  given  values  of  0  and  Since  there  are  two  equations  in  two  unknowns  o  and  7 
can  be  obtained  as  functions  of  e0  and  ex.  Solving  for  o  and  7  we  obtain 


<7  =  2coCi/(co  -  2ei) 


(3.31) 


7  =  2  —  eo/(eo  —  2ei) 


(3.32) 


where  e0  and  ex  are  estimated  from  the  data  by  the  estimators  e0  =  zi Jm  and 

ex  =  Ei^i(m  —  i)zi/{m(m  -  1)}  [31].  Once  the  values  of  e0  and  ei  are  obtained  the 
estimates  of  o  and  7  are  obtained  by  making  use  of  equations  (3.31)  and  (3.32).  Note 
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that  the  method  of  probability  weighted  moments  involves  computationally  simple 
expressions  for  the  estimates. 

3. 5. 1.3  The  Ordered  Sample  Least  Squares  Method  -  A  New  Approach 

The  procedure  used  in  maximum  likelihood  estimation  is  based  on  minimizing  equa¬ 
tion  (3.24).  The  probability  weighted  moment  estimates  are  obtained  by  equating  the 
sample  based  averages  with  the  theoretical  values  of  the  quantity  E[Z(  1  —  G(Z))3}, 
s=0,l,  where  Z  =  X  —  xq.  A  third  approach  is  the  ordered  sample  least  squares 
method,  which  is  based  on  the  principle  of  minimizing  the  squared  distance  between 
the  ordered  sample  and  the  expected  value  of  the  ordered  sample.  Computer  simula¬ 
tions  reveal  that  this  can  be  a  more  suitable  approach  for  estimating  the  parameters. 

In  Appendix  A  the  method  for  evaluating  the  mean  and  the  variance  of  the  rth 
order  statistic  from  a  sample  of  size  n  is  presented.  For  the  Generalized  Pareto 
Distribution  the  mean  and  the  variance  of  the  rtfl  order  statistic  can  be  derived  since 
the  probability  distribution  function  is  known  in  closed  form.  Let  x  be  replaced  by  z 
in  equation  (3.15)  and  let  G(z )  =  u.  Solution  for  z  results  in 


z  =  G-1(u)  =  -[(l-u)-"'-l).  (3.33) 

7 

Making  use  of  the  above  equation  and  equation  (A. 62)  in  Appendix  A,  the  expected 
value  of  the  rth  order  statistic  Zr  is 

E(Z,)  =  --. - TW - (3.34) 

7  (r  —  l)!(n  —  r)!  Jo 

The  integral  in  the  above  equation  can  be  broken  into  two  parts  as  follows. 

E (Zr)  =  -7 - 77^ - Til /‘(l  -  U)-V-(1  -  Uf-du  -  S'  -  «rr<H 

7  (r  —  l)!(n  —  r)!  Jo  Jo 

(3.35) 

From  results  presented  in  Gradshtyn  and  Ryzhik  [49],  the  expression  for  E(Zr )  be¬ 
comes 


E(Zr) 


cr  n\  (r  -  l)!(n  —  r  —  7)! 

—  7  (r  -  l)!(n  -  r)!  (n  -  7)! 

cr  n!(n-r-7)!  , 

7  (n  —  r)!(n  —  7)! 

_  <7  T(n +  l)r(n  -  r -7  + 1)  . 

7  T(n  —  r  +  l)T(n  -  7  +  1) 


(r  —  l)!(n  —  r)! 
n! 


(3.36) 
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To  calculate  the  variance  of  Zr,  we  first  calculate  E(Z 2).  Making  use  of  equation 
(3.33)  and  equation  (A.65)  in  Appendix  A,  the  expected  value  of  Z2  is 

E(Z']  =  ^(r  -DKn-r)!1!^1  "  (3'37> 

Expanding  the  square  in  the  integrand  of  the  above  equation  gives 

£(#)  =  $ ~r  _  !)?(»  _  r)! Ijf ' : 2(1  - »r  + : IK-'d  - u)-Vu].  (3.38) 
Making  use  of  results  from  [49],  the  above  integral  evaluates  to 


F(72\  =  a  n\  (n  —  r  — 27)!  2(n  -  r  -  7)! 

K  r)  72  (n  —  r)!  (n  -  27)!  (n  -  7)!  +  J 

_  a2  T(n  +  1)  .r(n  — r  — 27  +  I)  2T(n  —  r—  7  +  1) 


72  T(n  —  r  +  1)1  r(n  — 27+I) 


r(n  -7  +  I) 


+ 1].(3.39) 


From  equations  (3.36)  and  (3.39)  and  the  relation  Var(Zr)  =  E(Z2)  —  E2.(Zr),  we 


Var(7\  =  **  r(n  +  1)  fr(n-r-27+l) 

(  r)  72  T(n  —  r+1)  r(n  —  27  +  1) 

_  2T(n  -  r  -  7)  +  1  cr T(n  +  l)r(n  -  r  -  7  +  1)  ~ 

r(n  —  7  +  I)  j  7  T(n  —  r  +  l)r(n  —  7  +  I)  J)C  j 

Simplifying  the  above  equation  results  in 

yarlz  ,  =  r(n  +  l)  r(n  —  r  —  27  +  I)  _  r2(n  +  l)  r»(n  -  r  -  7  +  1). 

1  ’’  72  r(n  —  r  +  1)  r(n-2i  +  l)  T2(n-r+l)  P(n-7  +  l)  *' 

(3.41) 

Letting  Qr(‘f)  =  r(.!-r+i|  r(n-~+t|'  >  results  in 


£(Z.)  =  „,  =  -{«,(  7)-l} 


(3.42) 


Var(Zr)  =  =  -e{<?r(27)  -  «,(7))2}- 


(3.43) 


A  computationally  simpler  expression  can  be  found  for  <5.(7)  by  making  use  of  the 
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properties  of  the  gamma  function.  Dividing  Qr( 7)  by  Qr_ 1(7)  we  get 


r\  f  \'  r(n+X)  r(n— r  r+1) 

Qr\l)  _  r(n-r-t-l)  r(n-~H-l)  »  ~  r  +  1 

Or_1(7)  Ilnili  r(H-r  r+2)  n  —  r  —  7  +  1 

1W'  r(n— r+2)  T(n — y+1)  '  ^ 


(3.44) 


Equation  (3.44)  reduces  to 


<?,(7)=n  "  it1  • 

,4 1  n  -  i  +  1  -  7 


(3.45) 


To  find  the  least  squares  estimates  of  the  parameters  we  write  the  following  non¬ 
linear  model  for  the  rth  sample  order  statistic 


Zr  =  E(Zt)  +  er,  r  =  1, 2, ...,  m 


(3.46) 


where  the  error  term  er  has  a  distribution  with  mean  0  and  variance  cr^.  Since  the 
order  statistics  are  not  independent,  the  errors  are  also  not  independent.  Because 
of  the  nonlinear  structure  of  the  model  in  equation  (3.46)  and  correlated  errors, 
least  squares  estimation  does  not  offer  a  straightforward  solution  to  the  estimation 
problem.  Even  so,  in  this  study  we  proceed  to  use  the  ordered  sample  least  squares 
(OSLS)  procedure  to  estimate  the  parameters. 

In  equation  (3.42),  we  note  that  the  scale  parameter  a  appears  linearly  whereas  the 
shape  parameter  7  does  not.  The  least  squares  estimates  are  obtained  by  minimizing 
the  quantity 

m  m 

S  ='£'!  =  E(Zr  -  <Qr( 7)  -  i)/i)2.  (3.47) 

r= 1  r~l 

Since  a  appears  linearly  in  the  above  expression,  minimization  can  be  achieved  ana¬ 
lytically.  Differentiating  equation  (3.47)  with  respect  to  a  and  setting  the  derivative 
equal  to  zero  results  in 

m  1 

2£((Z,  -  -Wr( 7)  -  l))((-i(<Jr(7)  -  1))  =  0.  (3.48) 

r=l  7  7 

The  solution  for  a  from  the  above  equation  is 


6(7)  =  7 


nr.lZr(<?  r(7)~l) 

l«?r(7)  -  1)!  ' 


(3.49) 


The  expression  for  a  is  substituted  in  equation  (3.47)  and  the  resulting  expression 
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is  minimized  with  respect  to  7.  After  the  substitution  the  resulting  expression  is 
nonlinear  and  minimization  cannot  be  performed  analytically.  Using  the  Nelder- 
Mead  algorithm  [48],  the  minimization  is  done  numerically.  Once  the  estimate  of  7 
is  obtained,  <7  is  obtained  from  equation  (3.49). 

Recall  that  the  GPD  is  being  used  to  approximate  the  tail  of  the  underlying  dis¬ 
tribution.  Hence,  the  ordered  statistics  Zr,  r  =  l,2,...,m,  from  the  GPD  actually 
correspond  to  the  ordered  statistics  Xn-m+i  —  x0,  Xn-m+ 2  —  x0...Xn  —  x0  from  the 
underlying  distribution. 

The  least  squares  procedure  results  in  a  computationally  convenient  algorithm.  It 
is  emphasized  that  the  minimization  of  S  is  carried  out  only  with  respect  to  the 
single  parameter  7.  Furthermore,  the  underlying  criterion  is  based  on  minimizing  the 
distance  between  the  empirical  values  and  the  expected  values  of  the  ordered  samples. 

Some  numerical  comparisons  are  given  in  section  3.6. 

3.5.2  Estimation  of  Thresholds 

The  Generalized  Pareto  Distribution  that  is  estimated  from  the  data  is  used  to 
approximate  the  tail  of  the  unknown,  underlying  distribution.  We  now  show  that 
the  threshold  is  related  to  the  approximating  distribution  function  in  a  direct  man¬ 
ner.  With  reference  to  equation  (3.19),  let  fja  denote  the  estimate  of  the  threshold 
corresponding  to  a  false  alarm  probability  a.  We  then  have 

F(fja)  =  1  -  a  =  1  -  A[1  +  l(tja  -  x0))-lh.  (3.50) 

< 7 

Solution  for  rjQ  results  in 

t)q  =  x0  +  aiq-'1  -  l)/7  (3.51) 

where  A  =  1  —  F(x 0),  q  =  (1  —  <*)/A  and  xq  =  F~1(  1  —  A).  For  many  applications 
DuMouchel  [47]  suggests  that  A  =  0.1  be  used.  As  will  be  discussed  in  the  subsequent 
sections,  the  optimal  value  of  A  depends  on  the  threshold  being  estimated.  Since  the 
distribution  function  F(x)  is  not  known,  Xo  cannot  be  determined  for  a  given  value 
of  a.  Therefore,  following  common  practice,  the  sample  order  statistic  X„_m,  where 
m  =  [An]  and  [ .  ]  denotes  the  integer  part  operator,  is  used  as  an  estimate  of  xo- 
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3.6  Numerical  Results 

3.6.1  Characterization  of  Tail  Shape  for  Known  Distributions 

We  first  discuss  a  method  for  estimating  the  parameters  of  the  GPD  when  the 
underlying  distribution  is  known.  Choose  i0  such  that  1  —  F(x o)  =  0.1.  Then  define 
the  points  p,  i=l,2,...100(J  by 

Pi  =  0.90005  +  0.0001  (t  -  1).  (3.52) 

Analytically  evaluate  Xi  =  F~1(pi)  from  the  known  distribution.  Using  the  1000 
values  of  xt,  the  maximum  likelihood  estimation,  the  ordered  sample  least  squares 
and  the  probability  weighted  moments  procedures  were  applied  to  determine  the 
corresponding  7  values  for  various  distributions.  The  results  are  given  in  Table  3.1. 
The  number  in  parentheses  for  the  Weibull  and  Lognormal  distributions  is  the  value  of 
the  shape  parameter.  For  the  remaining  distributions  the  number  denotes  the  degrees 
of  freedom.  Since  <7  is  a  scale  parameter,  the  shape  parameter  7  best  describes  the 
tail  shape.  For  the  exponential  and  the  uniform  distributions  the  value  of  7  can  be 
obtained  analytically.  7  =  0  for  the  exponential  distribution  and  is  —1  for  the  uniform 
distribution.  Since  the  size  of  the  tail  decreases  with  decreasing  7,  the  relationship 
between  the  tail  behavior  and  the  corresponding  values  of  the  shape  parameter  7  can 
be  clearly  inferred  from  this  table. 

3.6.2  Empirical  Properties  of  the  Estimators  for  Known  Distributions 

Seven  distributions  with  widely  differing  tail  behaviors  were  chosen  in  order  to 
investigate  the  adequacy  of  the  approximation  of  extreme  tails  by  the  GPD  and 
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to  compare  the  three  estimation  procedures.  The  gamma  distribution  and  Weibull 
distribution  with  shape  parameter  of  value  3  have  tails  lighter  than  those  of  the 
exponential  PDF.  The  tails  of  the  chi-square  distribution  with  4  degrees  of  freedom 
and  the  student-T  distribution  with  8  degrees  of  freedom  are  approximately  the  same 
as  those  of  the  exponential  PDF.  Finally,  the  student-T  distribution  with  4  degrees  of 
freedom  and  the  Lognormal  distribution  with  shape  parameter  of  value  1  have  tails 
heavier  than  those  of  the  exponential  PDF. 

Let  r?  and  fj  denote  the  true  and  estimated  thresholds,  respectively.  A  Monte  Carlo 
experiment  was  performed'to  investigate  the  normalized  bias,  2=2  and  the  normalized 
mean  square  error  (^)2  of  the  proposed  threshold  estimates.  The  four  sample 
sizes  given  by  m  =  25,50,100  and  1000  were  considered.  Each  set  of  samples  was 
obtained  by  generating  n  observations  and  taking  the  largest  m  =  O.ln  observations. 
For  example,  a  set  of  samples  of  size  25  was  obtained  by  selecting  the  largest  25 
observations  from  a  collection  of  250  samples.  For  each  of  the  four  dilferent  values  of 
m,  k=200,000/m  trials  were  performed  for  each  of  the  seven  distributions.  The  median 
of  the  normalized  bias  values  was  computed  for  each  distribution  and  estimation 
procedure.  The  results  for  PF  =  10“*,  k=2,3,...7  are  given  in  Table  3.2.  Similarly  the 
median  of  the  positive  square  root  of  the  normalized  mean  square  error  are  presented 
in  Table  3.3.  The  results  in  the  two  tables  differ  because  the  sign  of  (77  —  tj)/tj  is 
lost  in  the  normalized  root  mean  square  values  computed  in  Table  3.3.  Extremely 
poor  estimates  for  77  were  obtained  in  a  few  of  the  trials.  These  poor  estimates  could 
severely  influence  an  arithmetic  mean  of  the  estimates.  To  avoid  this  problem,  median 
values  were  used  in  place  of  arithmetic  means. 

The  empirical  results  in  Table  3.2  indicate  that  the  newly  proposed  ordered  sample 
least  squares  estimator  generally  has  a  smaller  normalized  bias  than  the  other  estima¬ 
tors  for  small  or  moderate  sample  sizes.  Overall  the  second  smallest  normalized  bias 
is  achieved  by  the  probability  weighted  moments  method.  The  maximum  likelihood 
estimator  has  the  largest  normalized  bias  when  Pp  >  10-5,  especially  for  the  long 
tailed  distributions.  The  normalized  bias  of  all  three  estimators  decrease  as  the  sam¬ 
ple  size  increases.  When  the  parent  distribution  is  GPD,all  three  estimators  perform 
very  well.  Even  so,  the  ordered  sample  least  square  estimator  outperforms  the  others. 
The  relatively  strong  performance  for  the  GPD  is  explained  as  follows.  The  extreme 
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value  theory  is  based  on  the  premise  that  tails  of  smooth  continuous  distributions 
tend  towards  the  GPD.  For  the  GPD,  this  premise  is  exactly  satisfied.  Hence,  the 
corresponding  performance  is  noticeably  better  than  for  other  distributions. 

The  results  for  the  median  of  the  normalized  root  mean  square  error  are  surprising. 
The  maximum  likelihood  estimator  is  known  to  be  asymptotically  efficient.  This  is 
always  true  when  the  samples  are  drawn  from  the  underlying  distribution  (in  our  case 
from  the  generalized  Pareto  distribution).  This  property  of  the  maximum  likelihood 
estimator  can  be  observed  in  Table  3.3  when  m  =1000  but  not  for  smaller  sample 
sizes.  Although  the  ordered  sample  least  squares  method  has  a  smaller  normalized 
root  mean  square  error  in  many  cases,  there  is  no  clear  winner  with  respect  to  this 
criterion. 

From  the  empirical  results  which  are  based  on  a  limited  number  of  distributions 
and  sample  sizes,  it  is  not  easy  to  make  a  strong  recommendation  as  to  which  method 
to  use  in  practice.  However,  in  terms  of  the  normalized  bias,  the  ordered  samples  least 
squares  estimator  appears  to  perform  better  than  the  other  estimators  in  estimating 
the  large  thresholds  when  Pp  <  10-6.  In  any  event,  it  is  seen  that  the  extreme  value 
theory  can  be  used  successfully  to  determine  threshold  values,  when  the  false  alarm 
probability  is  very  small. 

Two  practical  advantages  of  estimation  based  on  extreme  value  theory  are:  1)  When 
there  is  a  constraint  on  the  number  of  samples,  the  thresholds  obtained  from  extreme 
value  theory  are  expected  to  be  closer  to  the  true  thresholds  than  those  obtained  by 
conventional  Monte  Carlo  techniques.  However,  in  both  techniques  an  increase  in 
sample  size  offers  greater  accuracy  in  estimating  thresholds.  2)  Because  the  estimate 
of  the  tail  of  the  underlying  distribution  is  in  closed  form,  estimation  can  be  made 
for  thresholds  corresponding  to  extremely  small  false  alarm  probabilities  independent 
of  the  sample  size.  In  experiments  with  fixed  amounts  of  data,  this  is  an  important 
advantage. 

3.6.3  Effect  of  the  Choice  of  A  on  the  Threshold  Estimates 

As  was  mentioned  previously,  only  those  samples  which  exceed  xo  are  used  in 
estimating  the  GPD  parameters.  The  value  of  xq  is  determined  by  A.  The  results 
presented  in  Tables  3. 2-3. 3  were  obtained  by  means  of  Monte  Carlo  experiments  where 
A  =  0.1  was  used  independent  of  the  value  of  Pf  for  which  the  threshold  was  being 
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estimated.  When  the  false  alarm  probability  was  extremely  small,  the  bias  and  root 
mean  square  errors  were  quite  large  for  some  distributions.  The  smaller  the  value  of  A, 
the  better  will  be  the  GPD  approximation  over  the  extreme  tail  being  approximated. 
When  A  is  chosen  too  large,  a  better  fit  is  found  for  that  portion  of  the  distribution 
closer  to  the  center  at  the.expense  of  lesser  accuracy  in  the  extreme  tail.  Of  course, 
there  is  a  tradeoff  between  the  choice  of  A  and  the  number  of  data  samples  available 
for  determining  the  parameters  of  the  GPD. 

In  our  application  the  major  objective  is  to  approximate  the  extreme  tails  corre¬ 
sponding  to  thresholds  of  10-5  or  smaller.  Consequently,  we  explored  the  implications 
of  selecting  values  less  than  0.1  for  A.  To  accomplish  this,  we  obtained  the  theoreti¬ 
cal  values  of  x,-  for  the  standard  Normal  and  Lognormal  distributions  corresponding 
to  F-1(p,)  where  p,  =  :  i=l,2,...n,  and  n  =  1,000  and  10,000  respectively. 

These  two  distributions  are  chosen  because  they  represent  extremes:  The  Normal 
distribution  is  light  tailed  while  the  Lognormal  is  a  heavy  tailed  distribution. 

The  number  of  the  xt-  samples  used  to  determine  the  parameters  of  the  GPD  is 
given  by  An.  The  parameters  were  estimated  using  the  OSLS  procedure  for  values  of 
A  equal  to  0:1,  0.05  and  0.01.  The  resulting  GPDs  were  then  used  to  determine  the 
thresholds  for  false  alarm  probabilities  given  by  Pp  =  10"*  where  k=2,3,...7.  These 
results  are  presented  in  Figure  3.6,  where  both  the  theoretical  and  approximated 
thresholds  are  plotted  as  a  function  of  k  for  (A)  Normal  distribution  (n=10,000),  (B) 
Normal  distribution  (n=1000),  (C)  Lognormal  distribution  (n=10,000),  (D)  Lognor¬ 
mal  distribution  (n=1000).  For  k  >  5,  it  is  seen  that  A  =  0.01  (curve  b)  appears  to 
be  the  best  choice  for  approximating  the  thresholds.  The  best  results  were  obtained 
with  n  =  10,000.  However,  good  results  were  obtained  with  n  =  1,000. 

3.7  Examples 

3.7.1  Known  Distribution  Case 

To  evaluate  the  accuracy  of  the  threshold  value  estimates,  10000  random  samples 
were  generated  from  the  Gaussian  and  Lognormal  distributions  and  the  upper  tails 
of  these  two  distributions  were  modeled  as  Generalized  Pareto.  In  sections  3.4.1  and 
3.4.3,  theoretical  values  given  by  x,-  =  F-1(p,)  were  used  to  estimate  the  tail.  In 
this  section  randomly  generated  samples  are  used  in  place  of  the  theoretical  values. 
Choosing  A  =  0.01,  the  theoretical  thresholds  of  the  Gaussian  distribution  for  Pp  = 
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-0.0929 

BSM 

-0.4761 

-0.5881 

t(8) 

OSLS 

-0.0221 

-0.0186 

Eigglrgl 

-0.1975 

t(8) 

ML 

-0.0104 

-0.0468 

ESI 

1 

t(8) 

PWM 

-0.0129 

-0.0452 

ES9 

Chi-sq(4) 

OSLS 

-0.0209 

-0.0039 

wmmm 

MjHjfl 

Chi-sq(4) 

ML 

-0.0037 

0.0943 

0.6185 

Chi-sq(4) 

PWM 

-0.0144 

-0.0205 

-0.1254 

mmfmm 

Lognormal 

OSLS 

-0.0835 

-0.0982 

Lognormal 

ML 

-0-0058 

0.1836 

1.2736 

2.4832 

4.4947 

Lognormal 

PWM 

-0.0543 

-0.0878 

Pareto(-0.25) 

OSLS 

-0.0092 

0.0208 

Pareto(-0.25) 

ML 

-0.0030 

0.0523 

0.1190 

Ell 

0.2479 

Pareto(-0.25) 

PWM 

-0.0077 

0.0052 

0.0121 

EEBH9 

IBi 

Table  3.2:  Median  of  the  normalized  bias  values  for  different  percentiles.  OSLS:Ordered 
Sample  Least  Square,  ML:Maximum  Likelihood,  PWM:Probability  Weighted  Moments 
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Pf 


Normal 

Normal 

Normal 


Weibull(3) 

Weibull(3) 

Weibull(3) 


Lognormal 

Lognormal 

Lognormal 


Pareto(-0.25) 

Pareto(-0.25) 

Pareto(-0.25) 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


Table  3.2:  Median  of  the  normalized  bias  values  for  different  percentiles,  (contd.) 
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An  =  100 


Pf 

warn 

10"a 

10-“ 

MSEM 

mam 

mam il 

Normal 

OSLS 

-0.1464 

Normal 

ML 

Normal 

PWM 

-0.1524 

-0.1986 

Weibull(3) 

OSLS 

-0.0017 

-0.0888 

Weibull(3) 

ML 

0.0270 

0.1158 

Weibull(3) 

PWM 

-0.0095 

EH1 

t(4) 

OSLS 

iiiimtf 

b m 

-0.3922 

t(4) 

ML 

BB 

WSm 

-0.4174 

-0.5354 

t(4) 

PWM 

Boil 

-0.4949 

t(8) 

mmm 

-0.2578 

-0.3548 

t(8) 

-0.3157 

-0.4216 

t(8) 

-0.2892 

-0.3916 

Chi-sq(4) 

OSLS 

-0.0077 

-0.1111 

Chi-sq(4) 

ML 

0.1189 

0.2655 

0.5917 

0.8298 

Chi-sq(4) 

PWM 

-0.0238 

-0.1143 

Lognormal 

OSLS 

jgiKtmW 

Lognormal 

ML 

0.1499 

■ 

Lognormal 

PWM 

0.2315 

0.3965 

Pareto(-0.25) 

OSLS 

Pareto(-0.25) 

ML 

1 

0.2215 

0.2611 

Pareto(-0.25) 

PWM 

BBal 

IS&l 

BKI 

0.0112 

Table  3.2:  Median  of  the  normalized  bias  values  for  different  percentiles,  (contd.) 
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Pf 


Normal 

Normal 

Normal 


Weibull(3) 

Weibull(3) 

Weibull(3) 


Chi-sq(4) 

Chi-sq(4) 

Chi-sq(4) 


Lognormal 

Lognormal 

Lognormal 


Pareto(-0.25) 

Pareto(-0.25) 

Pareto(-0.25) 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


OSLS 

ML 

PWM 


Table  3.2:  Median  of  the  normalized  bias  values  for  different  percentiles. 
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An  =  25 


Pf 

■si 

■sa 

mm 

mam 

Normal 

OSLS 

0.0558 

0.1127 

0.2022 

0.2825 

0.3507 

0.4044 

Normal 

ML 

0.0558 

0.0909 

0.1459 

0.2057 

0.2588 

0.3070 

Normal 

PWM 

0.0559 

0.1215 

0.2121 

0.2920 

0.3586 

0.4117 

Weibull(3) 

OSLS 

0.0257 

0.0577 

0.1089 

0.1580 

0.2031 

0.2415 

Weibull(3) 

ML 

0.0258 

0.0531 

0.0950 

0.1378 

0.1780 

0.2139 

Weibull(3) 

PWM 

0.0256 

0.0624 

0.1149 

0.1659 

0.2110 

0.2495 

HKHf 

OSLS 

0.1069 

0.2261 

0.4160 

0.5989 

0.7397 

0.8405 

ML 

0.1051 

0.2353 

0.4157 

0.5812 

0.7127 

0.8097 

PWM 

0.1019 

0.2329 

0.4213 

0.5956 

0.7368 

0.8344 

OSLS 

0.0781 

0.1666 

0.3073 

0.4455 

0.5701 

0.6730 

ML 

U.0779 

0.1493 

0.2554 

0.3648 

0.4689 

0.5649 

■n s 

PWM 

0.0775 

0.1752 

0.3180 

0.4544 

0.5783 

0.6787 

Chi-sq(4) 

OSLS 

0.0610 

0.1313 

0.2441 

0.3592 

0.4650 

0.5455 

Chi-sq(4) 

ML 

0.0721 

0.2179 

0.4459 

0.7901 

1.1783 

1.7789 

Chi-sq(4) 

PWM 

0.0592 

0.1384 

0.2500 

0.3622 

0.4666 

0.5446 

Lognormal 

OSLS 

0.1335 

0.2452 

0.4362 

0.6271 

0.7785 

0.8785 

Lognormal 

ML 

0.1439 

0.4007 

0.7303 

1.4149 

2.7312 

5.0774 

Lognormal 

PWM 

0.1260 

0.2582 

0.4463 

0.6281 

0.7737 

0.8705 

Pareto(-0.25) 

OSLS 

0.0409 

0.0787 

0.1348 

0.1752 

0.2017 

0.219 

Pareto(-0.25) 

ML 

0.0402 

0.0763 

0.1419 

0.2075 

0.2640 

0.3127 

Pareto(-0.25) 

PWM 

0.0411 

0.0866 

0.1430 

0.1817 

0.2084 

0.2240 

Table  3.3:  Median  RMS  errors  for  various  percentiles.  OSLSiOrdered  Sample  Least 
Square,  ML:Maximum  Likelihood,  PWM Probability  Weighted  Moments 
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An  =  50 


PF 

kbi 

10‘a 

KBI 

MEM 

KB1 

Normal 

OSLS 

0.0401 

0.0772 

0.1391 

0.1981 

0.2548 

0.3042 

Normal 

ML 

0.0394 

0.0689 

0.1122 

0.1559 

0.1959 

0.2328 

Normal 

PWM 

0.0399 

0.0865 

0.1530 

0.2192 

0.2759 

0.3273 

Weibull(3) 

OSLS 

•0.0180 

0.0393 

0.0743 

0.1135 

0.1511 

0.1854 

Weibull(3) 

ML 

0.0185 

0.0509 

0.0997 

0.1447 

0.1859 

0.2214 

Weibull(3) 

PWM 

0.0180 

0.0442 

0.0852 

0.1263 

0.1661 

0.2017 

OSLS 

0.0779 

0.1826 

0.3506 

0.5179 

0.6633 

0.7724 

HUH 

ML 

0.0768 

0.1910 

0.3602 

0.5244 

0.6688 

0.7762 

— 

PWM 

0.0760 

0.1778 

0.3332 

0.4899 

0.6303 

0.7386 

OSLS 

0.0561 

0.1228 

0.2316 

0.3503 

0.4666 

0.5698 

ML 

0.0553 

0.1219 

0.2226 

0.3385 

0.4504 

0.5529 

PWM 

0.0554 

0.1306 

0.2405 

0.3613 

0.4793 

0.5807 

Chi-sq(4) 

OSLS 

0.0431 

0.0890 

0.1678 

0.2509 

0.3351 

0.4109 

Chi-sq(4) 

ML 

0.0489 

0.1661 

0.3386 

0.5487 

0.7664 

1.1112 

Chi-sq(4) 

PWM 

0.0426 

0.0939 

0.1747 

0.2584 

0.3431 

0.4185 

Lognormal 

OSLS 

0.0975 

0.1834 

0.3439 

0.5155 

0.6660 

0.7990 

Lognormal 

ML 

0.0993 

0.3381 

0.6769 

1.3921 

2.6297 

4.7240 

Lognormal 

PWM 

0.0864 

0.1954 

0.3510 

0.5143 

0.6621 

0.8012 

Pareto(-0.25) 

OSLS 

0.0289 

0.0534 

0.0890 

0.1162 

0.1346 

0.1486 

Pareto(-0.25) 

ML 

0.0284 

0.0602 

0.1149 

0.1675 

0.2084 

0.2417 

Pareto(-0.25) 

PWM 

0.0293 

0.0616 

0.1032 

0.1320 

0.1533 

0.1666 

Table  3.3:  Median  RMS  errors  for  various  percentiles,  (contd.) 
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An  =  100 


Pf 

■EBB 

■EBB 

■EBB 

■EBB 

■ebb 

■EBB 

Normal 

OSLS 

0.0284 

0.0522 

0.0964 

0.1414 

0.1863 

0.2305 

Normal 

ML 

0.0290 

0.0517 

0.0840 

0.1123 

0.1433 

0.1689 

Normal 

PWM 

0.0281 

0.0584 

0.1090 

0.1635 

0.2134 

0.2585 

Weibull(3) 

OSLS 

0.0128 

0.0273 

0.0529 

0.0811 

0.1101 

0.1378 

Weibull(3) 

ML 

0.0131 

0.0389 

0.0790 

0.1202 

0.1570 

0.1868 

Weibull(3) 

PWM 

0.0126 

0.0312 

0.0622 

0.0942 

0.1284 

0.1590 

BKH 

OSLS 

0.0550 

0.1400 

0.2801 

0.4336 

0.5739 

0.6909 

■ip® 

ML 

0.0525 

0.1377 

0.2716 

0.4165 

0.5497 

0.6627 

■S mm 

PWM 

0.0527 

0.1334 

0.2619 

0.4046 

0.5323 

0.6469 

OSLS 

0.0386 

0.0914 

0.1770 

0.2761 

0.3758 

0.4732 

t(8) 

ML 

0.0388 

0.0896 

0.1735 

0.2710 

0.3734 

0.4763 

t(8) 

PWM 

0.0384 

0.0869 

0.1727 

0.2750 

0.3777 

0.4817 

Chi-sq(4) 

OSLS 

0.0287 

0.0649 

0.1264 

0.1932 

0.2699 

0.3373 

Chi-sq(4) 

ML 

0.0350 

0.1437 

0.2959 

0.4688 

0.6092 

0.8592 

Chi-sq(4) 

PWM 

0.0283 

0.0686 

0.1289 

0.1948 

0.2730 

0.3383 

Lognormal 

OSLS 

0.0683 

0.1527 

0.2794 

0.4174 

0.5299 

0.6290 

Lognormal 

ML 

0.0652 

0.1515 

0.2690 

0.4039 

0.5465 

0.6769 

Lognormal 

PWM 

0.0647 

0.1417 

0.2519 

0.3805 

0.5218 

0.6710 

Pareto(-0.25) 

OSLS 

0.0201 

0.0372 

0.0637 

0.0845 

0.0997 

0.1110 

Pareto(-0.25) 

ML 

'0.0197 

0.0569 

0.1192 

0.1746 

0.2221 

0.2613 

Pareto(-0.25) 

PWM 

0.0201 

0.0434 

0.0718 

0.0952 

0.1108 

0.1220 

Table  3.3:  Median  RMS  errors  for  various  percentiles,  (contd.) 
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Pf 


Normal 

Normal 

Normal 


Weibull(3) 

Weibull(3) 

Weibull(3) 


t(4) 

t(4) 

t(4) 


C 

C 

C 


Lognormal 

Lognormal 

Lognormal 


Pareto(-0.25) 

Pareto(-0.25) 

Pareto(-0.25) 


OSLS 

0.0077 

0.0182 

0.0373 

0.0643 

0.1017 

0.1440 

ML 

0.0087 

0.0160 

0.0362 

0.0632 

0.1075 

0.1476 

PWM 

0.0081 

0.0247 

0.0586 

0.1064 

0.1560 

0.2016 

OSLS 

0.0037 

0.0086 

0.0194 

0.0393 

0.0630 

0.0890 

ML 

0.0040 

0.0078 

0.0191 

0.0397 

0.0649 

0.0909 

PWM 

0.0036 

0.0108 

0.0300 

0.0578 

0.0880 

0.1192 

OSLS 

0.0203 

0.0534 

0.1383 

0.2476 

0.3717 

0.4763 

ML 

0.0213 

0.0447 

0.1083 

0.2168 

0.3326 

0.4406 

PWM 

0.0213 

0.0499 

0.1207 

0.2306 

0.3479 

0.4598 

OSLS 

0.0135 

0.0298 

0.0726 

0.1379 

0.2121 

0.3018 

ML 

0.0129 

0.0272 

0.0750 

0.1518 

0.2436 

0.3406 

PWM 

0.0129 

0.0349 

0.0939 

0.1830 

0.2863 

0.3863 

OSLS 

JD.0104 

0.0207 

0.0362 

0.0588 

0.1094 

0.1408 

ML 

0.0099 

0.0192 

0.0363 

0.0589 

0.1095 

0.1429 

PWM 

0.0100 

0.0211 

0.0400 

0.0602 

0.1103 

0.1433 

OSLS 

0.0206 

0.0528 

0.1222 

0.1836 

0.2429 

0.3276 

ML 

0.0195 

0.0434 

0.0984 

0.2012 

0.3581 

0.5999 

PWM 

0.0201 

0.0410 

0.0927 

0.1919 

0.3445 

0.5770 

OSLS 

0.0061 

0.0101 

0.0158 

0.0213 

0.0247 

0.0278 

ML 

0.0063 

0.0092 

0.0154 

0.0198 

0.0243 

0.0268 

PWM 

0.0065 

0.0126 

0.0222 

0.0306 

0.0375 

0.0428 

Table  3.3:  Median  RMS  errors  for  various  percentiles. 
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thresholds 


Fieure  3.6:  Normal  distribution,  n=10,000  Thresholds  for  PF  =  10  *.  Data  points  corre¬ 
spond  to  k  =  2, 3,...,  7.  a:True,  b:A=0.01,  c:A=0.05,  d:A=0.10. 
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thresholds 


Figure  3.6:  Lognormal  distribution,  n=10,000  Thresholds  for  Pp  =  10“*.  Data  points 
correspond  to  fc  =  2, 3,...,  7.  a:TVue,  b:A=0.01,  c:A=0.05,  d:A=0.10. 
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10-fc  k  =  2, 3. ..7  are  2.326,  3.090,  3.719,  4.265,  4.753  and  5.199,  respectively.  The 
thresholds  estimated  based  on  one  set  of  random  samples  are  2.315,  3.223,  3.847, 
4.370,  4.855  and  5.292.  For  the  Lognormal  distribution  the  theoretical  thresholds 
corresponding  to  PF  =  10“fc  k  =  2,3...7  are  10.240,  21.982,  41.224,  71.157,  115.981 
and  181.152.  Once  again,  using  A=0.01,  the  thresholds  estimated  based  on  one  set  of 
random  samples  are  10.449,  22.862,  42.473,  69.216,  112.229  and  183.495.  Note  that 
the  estimated  results  are  very  close  to  the  true  thresholds.  We  note  here  that  these 
results  were  obtained  on  the  basis  of  one  set  of  observations  from  the  two  known 
distributions,  corresponding  to  a  particular  seed  value.  For  a  different  set  of  samples 
the  estimates  will  be  different  depending  on  the  tail  behavior  of  that  set  of  samples. 
But,  unless  the  samples  are  really  not  a  true  representative  of  the  distribution  from 
which  they  are  drawn,  we  expect  that  the  estimates  based  on  different  samples  should 

give  threshold  values  that  yield  false  alarm  probabilities  close  to  the  design  value. 

3.7.2  An  Unknown  Distribution  Case 

In  the  previous  section  the  underlying  distributions  were  known  to  us  and  the 
estimates  based  on  the  extreme  value  theory  were  encouraging  for  both  light  and 
heavy  tail  behavior.  In  this  example,  we  take  a  non-Gaussian  problem  where  the 
underlying  distribution  is  unknown. 

The  two  hypotheses  characterizing  the  detection  problem  are  given  in  equations 
(2. 1-2. 2).  We  consider  the  weak  signal  case  for  which  the  clutter  is  much  stronger 
than  the  background  noise.  The  locally  optimum  detector  (LOD)  [8]  has  been  shown 
to  be  suitable  for  the  weak  signal  detection  problem.  Under  hypothesis  H\,  the  signal 
is  denoted  by  #s,-,  where  0  is  a  measure  of  the  signal  strength.  For  a  deterministic 
signal  and  a  given  set  of  observations  r  =  [rq,  r2..., r^r]T  the  LOD  performs  the  ratio 


TLOd(l)  =  “  r-ljp-  >  Vk  (3-53) 

oPr\h0{l\Ho)  Ho 

where  -Pg|//,(r|/ft)  is  the  joint  PDF  of  ri,r2,...r^  under  hypothesis  if,-:  i=0,l. 

Martinez,  Swaszek  and  Thomas  [22]  studied  the  locally  optimal  detection  problem 
for  non-Gaussian  distributions  and  considered  the  bivariate  Laplace  distribution  as  an 
example.  In  this  section  we  illustrate  the  procedure  for  determining  the  thresholds 
of  a  LOD  based  on  N=2  and  the  received  samples  under  Hq  having  the  bivariate 
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Laplace  distribution  given  by 


Mr^)  =  ^n-2KoK2rTM-'r)^]  (3.54) 

where  M  is  the  covariance  matrix  for  the  two  samples,  \M\  denotes  its  determinant, 
rrM_1r  is  equal  to  (r2  —  2 pr^r^  +  ^/(l  —  p2),  p  is  the  correlation  coefficient  between 
R\  and  R 2  and  Kq{.)  is  the  modified  Bessel  function  of  the  second  kind  of  zero  order. 
The  resulting  locally  optimum  detector  statistic  is  [22] 


^1[(2rrM-1r)1/2] 

Ko[{2rTM-'Lyl*) 


x  sT M  xr 


(3.55) 


where  s  =  (si,s2)T,  sTM~1r=  (rx  -  pr2)s1  +  (r2  -  pr1)s2  and  Kx(.)  is  the  modified 
Bessel  function  of  the  second  kind  of  first  order,  sj  and  s2  are  the  known  signal 
levels.  In  this  example  we  take  sx  =  1  and  s2  =  —1.  Because  of  the  complexity  of 
Tlod{-)-,  it  is  not  possible  to  determine  a  closed  form  expression  for  its  probability 
density  function. 

In  many  applications  in  radar,  thresholds  have  to  be  set  to  achieve  desired  false 
alarm  probabilities  based  on  a  sample  size  which  is  orders  of  magnitude  less  than 
10 / Pf-  As  will  be  pointed  out  later,  the  statistic  in  equation  (3.55)  represents  a 
worst  case  situation  in  the  sense  that  our  simulations  indicate  that  the  variance  of 
the  test  statistic  is  extremely  large.  To  investigate  the  reliability  of  the  thresholds 
estimated  based  on  extreme  value  theory  with  smaller  sample  sizes,  10,000  pairs 
of  observations  (r1?r2)  were  generated  from  the  bivariate  Laplace  distribution  given 
in  equation  (3.54),  with  p  =  0.90.  The  values  of  T]_,oD(r\ir2)  were  computed  for 
each  pair  and  sorted  in  increasing  order.  Corresponding  to  A  =  0.01,  the  largest 
100  values  of  the  underlying  statistic  (the  top  one  per  cent)  were  selected  to  fit  the 
Generalized  Pareto  Distribution.  This  experiment  was  repeated  250  times.  The 
threshold  corresponding  to  a  certain  false  alarm  probability  Pp  of  the  distribution  of 
the  statistic  Tlod^Ii  ri)  is  estimated  from  equation  (3.51)  as  fja  =  x0  +  ^[(l^)-^  — 
l]/7  where  Xq  is  the  9900t/l  largest  value  of  the  statistic.  Thresholds  were  estimated  for 
false  alarm  probabilities  Pp  =  10-fc,  k  =  2,  ...,7  for  each  repetition  of  the  experiment. 
Histograms  of  these  threshold  values  are  shown  in  figure  3.7,  for  the  different  Pps. 
To  give  a  better  appreciation  for  the  range  of  values,  the  bins  are  not  necessarily  of 
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equal  width.  The  histograms  give  an  indication  of  the  spread  in  the  threshold  values 
depending  on  the  particular  samples  collected.  From  the  histograms  corresponding 
to  false  alarm  probabilities  of  10-2,  10-3  and  10-4  we  can  see  that  the  threshold 
estimates  obtained  on  the  basis  of  even  one  set  of  samples  is  likely  to  approximately 
yield  the  desired  Pp.  Since  the  underlying  distribution  of  Tlod(-)  is  unknown,  one 
measure  of  the  accuracy  of  the  estimate  is  the  extent  to  which  most  of  the  estimates 
fall  in  one  bin  of  the  histogram.  Also,  we  can  see  that  there  is  negligible  overlap 
between  the  estimated  threshold  values  in  the  histograms  for  the  three  different  values 
of  Pp.  This  supports  the  claim  that  the  estimated  threshold  is  likely  to  yield  a  false 
alarm  probability  which  is  of  the  same  order  as  the  desired  Pp.  There  is  a  higher 
overlap  in  the  thresholds  of  the  histograms  for  Pf,=10-5,  10-6  and  10-7.  Also,  there  is 
much  higher  spread  in  the  threshold  values  estimated.  Based  on  the  excellent  results 
obtained  for  the  same  choices  of  Pp  in  the  known  cases  of  the  previous  section,  these 
results  are  surprising.  However,  it  is  explained  as  follows.  The  7  values  of  the  GPD 
estimated  for  the  different  repetitions  of  this  experiment  lie  in  the  range  0.45  —  0.55. 
This  represents  an  extremely  heavy  tailed  distribution.  From  Table  3.1  we  see  that 
the  Lognormal  distribution,  which  is  quite  a  heavy  tailed  distribution,  has  7=0.232. 
The  heavy  tailed  nature  of  the  detector  statistic  can  also  be  observed  by  comparing 
the  large  threshold  values  seen  in  the  histograms  with  the  corresponding  thresholds 
of  the  Gaussian  and  the  Lognormal  distributions.  The  variance  of  the  GPD  is  given 

by 

Vor<*>  =  irwrw)  7<0'5 

=  00  7  >0.5  (3.56) 

Thus,  the  bivariate  Laplace  results  in  a  very  highly  fluctuating  statistic  with  an  ex¬ 
tremely  large  variance.  As  such,  it  represents  a  ‘worst  case’  situation  for  empirically 
determining  the  threshold.  A  much  larger  sample  size  is  needed  to  obtain  reliable 
threshold  estimates  because  of  the  exceedingly  large  tail  of  the  underlying  distribu¬ 
tion. 

In  general,  an  indication  of  how  heavy  the  true  tail  may  be  for  an  unknown  distri¬ 
bution  is  given  by  the  estimate  of  7  for  the  GPD.  When  an  extremely  heavy  tail  is 
indicated,  another  strategy  for  estimating  the  thresholds  when  Pf  is  very  small  is  to 
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choose  the  median  value  of  the  thresholds  estimated  when  the  experiment  is  repeated 
a  specified  number  of  times  with  10,000  samples  in  each  repetition.  The  choice  of 
the  median  as  the  estimator  ensures  that  very  large  and  very  small  values  do  not 
affect  the  results.  For  the  present  example,  we  chose  to  repeat  the  250  trials  three 
times.  By  counting  the  number  of  estimates  that  fell  into  the  bins  centered  at  20,  28 
and  36  for  Pf=10-5,  40,  50,70  and  90  for  Pf=10“6  and  100  and  150  for  Pp=10-7,  it 
was  found  that  88  percent  of  the  estimates  fell  into  these  bins.  Thus,  even  for  this 
extremely  large  tailed  example,  we  believe  that  use  of  the  GPD  has  allowed  us  to 
estimate  useful  values  for  the  thresholds  with  sample  sizes  much  smaller  than  10 /Pf- 
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Figure  3.7:  Histograms  of  threshold  values.  (A)  Pp  =10  2  (B)  Pp  =  10  3  (C)  Pp  —  10 
(D)  PF  =  10“5  (E)  PF  =  10-6  (F)  PF  =  10"7 
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Figure  3.7:  Fig  3.7  Contd. 
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Figure  3.7:  Fig  3.7  Contd. 
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Figure  3.7:  Fig  3.7  Contd. 
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Chapter  4 


Performance  of  the  Locally 
Optimum  Detector  for 
Multivariate  Student-T  and 
K-Distributed  Disturbances 


In  radar  problems  involving  weak  signal  applications,  it  is  found  that  the  large  returns 
due  to  clutter  can  lead  to  a  small  signal  to  disturbance  ratio.  When  the  density 
function  of  the  clutter  exhibits  an  extended  tail  behavior  relative  to  the  Gaussian 
PDF,  the  probability  density  function  of  the  disturbance  can  no  longer  be  modeled 
as  Gaussian.  The  significance  of  a  non-Gaussian  PDF  with  an  extended  tail  is  that 
many  more  large  returns  result  than  would  be  the  case  for  a  Gaussian  PDF  having 
the  same  variance.  Hence,  there  is  a  need  to  be  able  to  model  non-Gaussian  random 
processes. 

The  multivariate  student-T  distribution  is  a  member  of  the  class  of  joint  PDFs 
arising  from  Spherically  Invariant  Random  Processes  (SIRP).  SIRPs  are  explained  in 
Chapter  2.  When  an  SIRP  is  sampled  at  N  instants  in  time,  the  resulting  vector  is  said 
to  be  a  spherically  invariant  random  vector  (SIRV).  The  theory  of  SIRPs  offers  a  way 
to  model  the  joint  density  function  on  these  N  samples  where  the  correlation  between 
the  individual  random  variables  in  the  vector  is  accounted  for.  With  this  approach 
locally  optimum  detector  structures  can  be  derived  for  non-Gaussian  disturbances 
without  the  need  to  assume  that  the  random  variables  are  statistically  independent. 
In  this  chapter  we  analyze  the  performance  of  the  LOD  for  the  known  signal  problem 
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when  the  background  disturbance  consisting  of  clutter  and  noise  can  be  modeled  as 
having  a  multivariate  student-T  distribution  or  a  multivariate  K-distribution. 

4.1  The  Multivariate  Student-T  Distribution 

A  convenient  procedure  for  deriving  the  multivariate  student-T  distribution  from 
the  representation  theorem  [28]  is  discussed  in  this  section.  Let  the  random  vector  X_ 
have  a  multivariate  Gaussian  distribution  with  zero  mean  and  covariance  matrix  M. 
The  zero  mean  assumption  will  not  affect  the  generality  of  the  results  that  follow. 
The  joint  density  function  on  the  elements  of  2L  is  given  by 

-  (27r)7V|M|1/2e  2  ^ 

where  the  vector  X_  has  2 N  elements  from  N  inphase  and  N  quadrature  samples. 
Consider  the  vector  W  =  where  u  is  a  nonnegative  random  variable  statistically 
independent  of  X_.  Let  w? M~lw  be  denoted  by  the  variable  p.  Then,  the  conditional 
density  function  of  the  vector  W  given  u  can  be  written  as 


fw(w\u)  = 


(2tt)w|M|1/2 


2JV  ----- 

vl  e  2 


The  unconditional  density  function  on  is  given  by 


roo 

fw(m)  =  / 

^0 


where  f„( v)  is  the  probability  density  function  of  the  random  variable  v.  Because  2L 
and  v  are  statistically  independent,  it  follows  that 

E(W)  =  E(—)  =  E(X)E(v~1)=  0  (4.4) 

E(WWT)  =  E(XXt)E(v~2)  =  E{v~2)M.  (4.5) 

It  can  be  seen  from  the  above  equation  that  the  covariances  of  the  elements  of  the 
vector  W  can  be  adjusted  by  appropriate  choice  of  E(i/~2). 

With  respect  to  equation  (4.3),  let  /„( i/)  be  the  generalized  chi  PDF  given  by 


fui”)  =  2 


lP.p-\e-aJ1  ofi 


v  >  0. 
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From  equation  (  4.6),  E(v  2)  can  be  calculated.  Specifically, 


E{v~2)  =  f 


2/3-1  p-av3 


,  Z/*K  *e 


— dv=  f 
Jo 


00  i/2^  3e~ot/2a/? 


Letting  ai/2  =  x  in  the  above  equation  we  get 


E(v  2)  —  a  f 
Jo 


00  x&  2e~xdx  T(8  —  1)  a 

r(fl  =  a~m~  =  J ~ 


If  we  let  a  =  8  —  1,  then  the  generalized  chi  PDF  in  equation  (  4.6)  is  such  that 
E(v~ 2)  =  1  irrespective  of  the  choice  for  the  parameter  /?.  Then  the  generalized  chi 
PDF  takes  the  form 

/,(*)  =  - - .  /?>L  (4>9) 

In  general,  we  can  set  the  value  of  E(v~2)  to  a  desired  constant  c  by  choosing  a  = 
c(/3  —  1).  Then  the  covariance  matrix  of  W_  is  guaranteed  to  equal  cM  independent 
of  j3. 

Integrating  the  conditional  density  function  fw(^w\u),  as  given  by  equation  (4.2), 
over  the  PDF  of  the  nonnegative  random  variable  1/,  we  obtain  the  multivariate 
student-T  distribution.  The  details  are  given  below.  Choosing  a  =  /3  —  1  in  equation 
(  4.6)  we  can  write 


00  1 


f°° 

f^m)  =  1  R? 


Jo  (2*)"\M\'/2‘ 

= 

(2‘k)n\M\1^2T(8) 

Letting  (/?  —  !  +  p/2)u2  —  y  we  get 


=£j,2v2P-1e-V3-1')l,2(p  -  l)^3  , 
v  e  2  _ _ _  — - —dv 


2l/2N+2(3-le-S(0-l+P/2)dU' 


fwjw)  = 


(/3-l)« 


roo  yN+P—lg—V 


r°° _ 

Jo  (8-1 


(2ir)N\M\1f2T(8)  Jo  (8-1  +  p/2)N+P 
{0  -  i)*r{N  +  0) 
(2jr)"|M|I/2r(/3)(/3-  1  +  p/2)A'+'r 


(4.10) 


(4.11) 


The  above  expression  is  defined  to  be  the  2iV-dimensional  multivariate  student-T 
distribution  with  parameters  N  and  8-  N  represents  the  number  of  complex  samples 
and  8  determines  the  tail  behavior  of  the  multivariate  density  function.  The  smaller 
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the  value  of  0,  the  larger  will  be  the  tail. 

The  density  function  in  equation  (4.9)  can  be  simulated  as  follows.  The  first  step 
is  to  generate  a  standard  Gamma  variate  having  the  density  function  fy{y)  =  }Lwr- 
This  was  done  using  the  subroutine  DGAMDF  from  the  IMSL  package.  The  next  step 
is  to  divide  the  generated  random  variable  by  the  parameter  /?  —  1.  Let  Z  =  Y/(0  —  1). 
The  density  function  of  Z  is 


fz(z)  = 


(0  -  1 


(4.12) 


The  positive  square  root  of  results  in  the  desired  density  function.  Let  v  = 
Z% .  Therefore  Z  —  v2 .  Introducing  the  Jacobian  of  the  transformation,  the  density 


function  of  v  becomes 


fM  = 


2v2P-'e-W-l>2{0-  If 


(4.13) 


which  is  identical  to  that  in  equation  (  4.9). 

4.1.1  The  Locally  Optimum  Detector 

The  locally  optimum  detector  for  the  multivariate  student-T  distribution  can  now 
be  derived.  From  equation  (2.32)  the  locally  optimum  detector  is  given  as 


dfp(r-83)  . 

d£  1 

Ml ) 


0=0 

—  <v- 
*0 


(4.14) 


Assuming  the  disturbance  can  be  modeled  by  a  multivariate  student-T  distribution, 
fR(r)  is  given  by  equation  (  4.11),  with  the  variable  R  replacing  the  variable  W, 
where  p  =  rrM_1r.  Since  equation  (4.14)  is  a  ratio  test  and  all  constants  can  be 
placed  in  the  threshold  which  is  determined  by  specifying  a  false  alarm  probability, 
all  multiplicative  constants  are  ignored  for  convenience.  Hence,  we  will  be  concerned 
only  with  the  terms  containing  the  variable  Excluding  the  constant  term  the 
numerator  in  the  ratio  test  is  given  by 


dML-Os)  i  dr  1  u 
dd  l<?=0  d9\0  —  1  +  p/2)N+P  e~°' 


(4.15) 


Applying  the  chain  rule,  the  derivative  with  respect  to  6  can  be  expressed  as  the 
derivative  with  respect  to  p  times  the  derivative  of  p  with  respect  to  0.  The  derivative 
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of  p  with  respect  to  6  at  0  =  0  can  be  derived  as 

!^|*=o  =  (-^(L-esfM-^r-Os))  |e=0  =  -2  !srA/_1r.  (4.16) 

Therefore,  the  numerator  in  the  ratio  test,  excluding  the  constant,  is  given  by 

<MLzM\e=Q  =  (/?  _  1  +  p/ 2)-<*+0+i)  x  /M-V.  (4.17) 

From  the  above  equation,  the  sufficient  statistic  for  the  locally  optimum  detector  for 
the  multivariate  student-T  distribution  can  be  written  as 

W r)  =  /?  —  1  +  p/2 "  (4'18) 

The  above  result  for  the  LOD  statistic  is  very  significant.  The  numerator  in  equation 
(4.18)  is  recognized  as  the  Gaussian  linear  detector.  This  detector  is  a  matched  filter 
which  maximizes  the  signal-to-disturbance  ratio  whether  or  not  the  disturbance  is 
Gaussian.  In  weak  signal  applications,  by  definition,  the  signal  to  disturbance  ratio 
will  still  be  low  after  matched  filtering.  The  denominator  of  the  LOD  statistic  is  the 
nonlinear  term  in  the  statistic.  The  behavior  of  the  nonlinearity  is  such  that  it  scales 
down  large  values  of  p  and  enhances  small  values  of  p.  The  nonlinearity  is  plotted  as 
a  function  of  p  in  Fig.  4.1.*  This  is  reasonable  because,  as  shown  in  Section  1.3,  large 
values  of  radar  returns  result  in  large  p  while  small  values  of  the  returns  yields  small 
values  of  p.  Because  it  is  known  a  priori  that  we  are  dealing  with  the  weak  signal 
problem,  large  returns  cannot  be  due  to  the  signal.  Consequently,  the  output  of  the 
matched  filter  is  weighted  by  a  small  number.  On  the  other  hand,  the  matched  filter 
output  is  weighted  by  a  large  number  when  the  return  is  small  and  the  contribution 
due  to  the  signal,  if  present,  can  be  detected.  However,  when  the  signal  is  present, 
the  optimum  nonlinearity  alone  is  not  sufficient  to  get  detections,  though  it  brings 
the  output  close  to  the  threshold  value.  The  matched  filter  which  has  a  higher  output 
value  when  the  signal  is  present  than  when  the  signal  is  absent,  contributes  to  raising 
the  output  value  over  the  threshold  when  detections  are  obtained.  The  role  of  the 
matched  filter  is  explained  in  greater  detail  in  Section  4.1.3. 
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Figure  4.1:  Nonlinearity  of  the  LOD  statistic  for  the  student-T  distribution 
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4.1.2  Computer  Simulation  of  Performance 

Because  analytical  closed  form  expressions  for  the  detection  and  false  alarm  proba¬ 
bilities  of  the  locally  optimum  detector  in  a  multivariate  student-T  distributed  clutter 
are  difficult  to  obtain,  performance  is  evaluated  through  computer  simulations  for 
weak  signal  applications.  For  simulation  purposes  a  multivariate  student-T  distribut¬ 
ed  disturbance  vector  D  and  a  transmitted  signal  vector  S_  have  to  be  generated.  The 
covariance  matrix  of  the  clutter  process  is  assumed  known  with  unit  elements  along 
the  diagonal.  To  get  the  covariance  matrix  of  the  disturbance  we  add  a  small  num¬ 
ber,  determined  by  the  clutter  to  noise  ratio,  to  the  diagonal  elements  of  the  clutter 
covariance  matrix.  This  serves  to  limit  the  performance  of  the  receiver  even  where 
the  clutter  power  is  negligible.  In  this  simulation,  the  clutter  to  noise  ratio  is  taken  to 
be  80  dB.  The  simulation  procedure  for  the  disturbance  vector  is  outlined  as  follows: 

.© 

1 .  Generate  a  2 JV-dimensional  white  Gaussian  random  vector  2L  •  This  was  done 
by  using  the  DRRNOA  subroutine  from  the  IMSL  package. 

2.  Do  a  Cholesky  decomposition  on  the  matrix  M  to  get  M  =  KKT  where  K  is  a 
lower  triangular  matrix. 

3.  The  vector  2L  =  KX[  is  the  multivariate  correlated  Gaussian  vector. 

4.  Generate  a  standard  Gamma  variate  Y.  This  was  done  by  using  the. subroutine 
DGAMDF  from  the  IMSL  package. 

5.  Obtain  v  —  (^y)*.  The  random  variable  v  has  the  generalized  chi  PDF. 

6.  Obtain  the  multivariate  student-T  distributed  disturbance  vector  D_  with  the 
desired  correlation  properties  from  J2  = 

The  block  diagram  of  the  simulation  procedure  is  shown  in  Fig.  4.2.  The  auto¬ 
correlation  of  the  clutter  process  is  taken  to  be  a  geometric  function  in  this  problem. 
Assuming  radar  returns  from  clutter  cells  to  be  highly  correlated,  as  is  the  case  with 
ground  clutter,  the  sample  to  sample  time  correlation  is  taken  as  0.95  in  this  problem. 
Specifically,  the  sample  autocorrelation  function  is  chosen  as 

Rcc{n)  =  { 0.95)n  n  =  0, 1,  ...,N  —  1  (4.19) 

where  Rcc(n)  is  the  discrete  time  autocorrelation  function  of  the  clutter  process. 
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Figures  4.3  and  4.4  show  the  autocorrelation  and  the  power  spectral  density  of  the 
clutter  process.  The  power  spectral  density  of  clutter  has  a  very  small  spread  as  the 
clutter  is  highly  correlated.  Using  equation  (4.19)  the  elements  of  the  covariance 
matrix  of  the  disturbance  can  be  filled  appropriately.  The  elements  of  the  signal 
vector  are  chosen  such  that  the  nth  element  Sn  =  ej2*^n-1)T,  n  =  1,2,  ...,7V.  fo 
represents  the  Doppler  frequency  shift  of  the  received  signal  and  T  represents  the 
time  separation  between  sampling  instants. 

The  detector  in  equation  (4.18)  is  now  simulated.  A  value  of  /?  =  1.5  for  the  multi¬ 
variate  student-T  distribution  is  chosen  because  this  value  results  in  a  relatively  long 
tail  for  the  corresponding  marginal  PDF  of  one  element  of  the  vector.  By  evaluating 
thresholds  for  specified  false  alarm  probabilities,  the  student-T  distribution  was  seen 
to  have  heavier  tails  than  the  Gaussian  distribution  for  false  alarm  probabilities  less 
than  10-4  but  smaller  tails  than  the  Gaussian  otherwise. 

The  thresholds  corresponding  to  false  alarm  probabilities  lO-*;  k  =  1,2, 3, 4  are 
obtained  through  the  method  of  extreme  value  theory  explained  in  Chapter  3.  Once 
the  threshold  is  set  the  detection  probabilities  are  obtained  by  simulating  the  LOD  for 
received  vectors  consisting  of  the  sum  of  the  signal  and  disturbance  vectors  for  various 
signal-to-disturbance  ratios.  The  value  of  fr>  is  chosen  to  be  zero  in  this  simulation, 
resulting  in  a  worst  case  situation.  The  number  of  trials  in  the  Monte  Carlo  simulation 
for  obtaining  detection  probabilities  is  equal  to  10, 000.  The  performance  of  the  LOD 
is  compared  to  that  of  the  Gaussian  detector  for  the  same  multivariate  student-T 
distributed  clutter.  The  test  statistic  for  the  Gaussian  detector  is  the  same  as  the 

numerator  of  the  LOD,  which  is  srM-1r. 

4.1.3  Results  of  the  Computer  Simulation 

The  results  of  the  computer  simulations  are  shown  in  Tables  4.1-4.10.  When  SCR  is 
less  than  0  dB,  it  is  seen  from  the  tables  that  the  LOD  always  outperforms  the  Gaus- 
sian  receiver  for  all  values  of  the  tabulated  false  alarm  probabilities.  The  difference 
is  especially  significant  for  false  alarm  probabilities  equal  to  10-3  and  10-4. 

The  student-T  distribution,  while  being  heavier  tailed  than  the  Gaussian,  is  not  as 
heavy  tailed  as  the  K-distribution  and  Weibull  distribution.  In  fact,  the  student-T 
distribution  may  not  be  a  likely  candidate  for  modeling  the  radar  disturbance.  The 
student-T  distribution  was  chosen  as  the  first  distribution  to  be  studied  only  because 
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Figure  4.5:  Log  power  spectral  density  of  the  clutter  and  signal  processes 
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of  the  mathematical  simplicity  and  well  behaved  nature  of  its  multivariate  PDF. 
Nevertheless,  the  analysis  done  with  the  student-T  distribution  confirms  that  the 
LOD  outperforms  the  Gaussian  receiver  for  a  non-Gaussian  weak  signal  application. 

As  the  signal  to  clutter  ratio  is  reduced,  it  can  be  observed  from  the  tables  that  the 
Gaussian  receiver  performance  degrades  abruptly  for  false  alarm  probabilities  less 
than  or  equal  to  10-2  whereas  the  LOD  shows  a  gentler  variation  in  performance. 
From  the  Tables  it  is  seen  that  the  performance  of  the  LOD  peaks  around  0  dB  and 
falls  off  for  both  higher  and  lower  values  of  SCR.  But  as  the  SCR  values  fall  below 
0  dB,  the  degradation  in. the  LOD’s  performance  is  not  as  drastic  as  that  of  the 
Gaussian  receiver.  Both  the  receivers  show  an  improvement  in  performance  as  the 
number  of  samples  is  increased.  However,  the  LOD  shows  a  dramatic  improvement  in 
performance  when  the  sample  size  is  greater  than  64.  When  the  sample  size  is  equal 
to  64  and  the  false  alarm  probability  is  as  low  as  10“ 4,  it  is  seen  from  Table  4.7  that 
the  detection  probability  resulting  from  the  LOD  is  on  the  order  of  tenths  for  SCR  as 
small  as  -8  dB.  The  corresponding  detection  probabilities  resulting  from  the  Gaussian 
receiver  are  negligibly  small.  In  fact,  there  is  an  improvement  factor  in  the  vicinity 
of  three  orders  of  magnitude  in  favor  of  the  LOD.  When  -10  dB<SCR<0  dB,  an 
interesting  observation  from  the  tables,  for  all  values  of  N  considered  is  that  the  LOD 
outperforms  the  Gaussian  receiver  by  close  to  one  order  of  magnitude  for  Pf  =  10-2, 
by  two  orders  of  magnitude  for  Pp  =  10-3  and  by  three  orders  of  magnitude  for 
Pf  =  10-4.  For  SCR  values  lower  than  -10  dB,  the  LOD  significantly  outperforms 
the  Gaussian  receiver  but  with  very  small  values  of  Pd ■  When  both  SCR>4  dB 
and  Pf  >  10-3,  the  Gaussian  receiver  outperforms  the  LOD,  as  can  be  seen  from  the 
tables.  For  positive  values  of  SCR  in  the  range  0-5  dB  and  for  false  alarm  probabilities 
equal  to  10-3  and  10-4,  it  is  interesting  to  note  that  the  LOD  still  outperforms  the 
Gaussian  receiver.  The  LOD  shows  a  significant  performance  improvement  over  the 
Gaussian  receiver  over  a  dynamic  range  of  about  14  dB.  The  end  points  of  the  range 
vary  depending  on  the  sample  size  and  false  alarm  probability. 

The  test  statistic  is  a  product  involving  the  outputs  of  matched  filter  and  the 
optimum  nonlinearity.  In  section  4.1.1  it  was  explained  how  the  nonlinearity  present 
in  the  test  statistic  boosts  small  values  of  the  received  signal  and  attenuates  large 
values.  This  not  only  serves  to  bring  down  the  value  of  the  threshold  needed  to 
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obtain  the  desired  false  alarm  probability  but  also  to  bring  the  output  for  small 
received  signals  close  to  the  threshold  value  whether  or  not  the  desired  signal  is 
present.  The  role  of  the  matched  filter  is  explained  as  follows.  The  matched  filter 
has  a  larger  output  value  when  the  signal  is  present  as  opposed  to  the  signal  absent 
case.  This  serves  to  increase  the  statistic  in  equation  (4.18).  However,  the  quadratic 
form  p,  in  general,  also  has  a  larger  value  when  the  signal  is  present  than  under  the 
Hq  hypothesis.  Thus,  the  factor  in  the  test  statistic  in  equation  (4.18)  due  to  the 
nonlinearity  decreases  when  the  signal  is  present.  This  serves  to  lower  the  value  of 
the  test  statistic.  Therefore,  detections  are  obtained  only  when  the  increase  in  the 
due  to  the  matched  filter  dominates  the  decrease  due  to  the  nonlinearity.  Simulations 
reveal  that  the  matched  filter’s  role  is  dominant  only  when  the  signal  to  clutter 
ratio  is  in  the  range  -10  dB<SCR<0  dB.  This  is  expected  since  the  linear  receiver’s 
performance  is  known  to  drop  drastically  for  very  low  signal  to  clutter  ratios.  The 
matched  filter’s  effect  is  enhanced  when  a  Doppler  is  present  in  the  desired,  received 
signal  since  the  clutter  components  become  less  correlated  with  the  Doppler  shifted 
reference  signal.  However,  one  must  be  careful  when  the  Doppler  frequency  is  so 
large  that  the  signal  spectrum  appears  in  the  tail  of  the  clutter  spectrum.  Then  a 
strong  signal  situation  may  exist  and  the  nonlinearity  which  transforms  large  values 
into  small  values  will  cause  performance  to  degrade.  The  LOD  should  not  be  used  in 
strong  signal  situations. 

The  aim  of  using  a  LOD  is  to  obtain  detection  in  Range-Doppler-Azimuth  cells 
where  conventional  space-time  processing  is  unable  to  obtain  acceptable  performance. 
In  present  day  radars  these  cells  are  ignored  because  the  probability  of  detection  is  so 
small  under  a  false  alarm  constraint.  In  general,  when  the  SCR  is  relatively  high  (>  0 
dB)  the  likelihood  ratio  test  is  the  optimal  test  for  target  detection  under  a  fixed  false 
alarm  constraint.  In  addition  to  not  performing  well  when  SCR  is  too  large,  the  LOD 
does  not  perform  well  when  SCR  is  too  small.  When  the  signal  to  clutter  ratio  drops 
below  a  certain  value,  depending  upon  N  and  Pp,  the  LOD  receiver  hardly  shows  any 
detections  even  though  it  still  outperforms  the  Gaussian  receiver.  This  is  because  the 
PDFs  under  Ho  and  Hi  are  so  close  to  each  other  that  it  is  impossible  to  discriminate 
between  them  without  increasing  the  sample  size  by  orders  of  magnitude. 

The  concepts  of  spherically  invariant  random  processes  and  locally  optimum  de- 
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SCR 

LOD 

GR 

5  dB 

Pd 

0.35 

0.65 

3  dB 

Pd 

0.36 

0.35 

1  dB 

Pd 

0.37 

0.23 

0  dB 

Pd 

0.38 

0.11 

-3  dB 

Pd 

0.32 

0.052 

-6  dB 

Pd 

0.27 

0.039 

-8  dB 

Pd 

0.23 

0.03 

-9  dB 

Pd 

0.15 

0.024 

-10  dB 

Pd 

0.10 

0.019 

Table  4.1:  N=16,  PF  =  10-2,  SCR:Signal  to  Clutter  Ratio,  LOD:Locally  Optimum  De¬ 
tector,  GRrGaussian  Receiver 


SCR 

LOD 

GR 

5  dB 

Pd 

0.08 

0.007 

3  dB 

Pd 

0.11 

0.005 

1  dB 

Pd 

0.17 

0.004 

0  dB 

Pd 

0.16 

0.003 

-3  dB 

Pd 

0.13 

0.002 

-6  dB 

Pd 

0.10 

0.001 

-8  dB 

Pd 

0.08 

0.001 

Table  4.2:  N=16,  PF  =  10-3,  SCR:Signal  to  Clutter  Ratio,  LOD:Locally  Optimum  De¬ 
tector,  GR:Gaussian  Receiver 


tectors  are  particularly  relevant  in  the  context  of  modern  radar  applications.  When 
the  radar  scans  a  volume  searching  for  targets  there  might  be  certain  regions  in  the 
volume  where  the  clutter  returns  are  significantly  stronger  than  the  desired  target 
returns.  It  is  in  these  regions  that  we  can  obtain  detections  with  the  LODs.  There  is 
a  need  to  monitor  the  environment  so  that  we  are  able  to  separate  the  clutter  regions 
from  volumes  that  are  limited  by  weak  background  noise.  When  detection  is  limited 
by  background  noise  alone,  the  LOD  is  not  applicable.  Using  the  concepts  of  artificial 
intelligence,  clutter  patches  can  be  identified  and  the  underlying  multivariate  PDF  of 
the  clutter  returns  can  be  approximated  using  the  library  of  SIRPs  that  have  been 
developed  [26].  From  the  library  of  LODs  the  LOD  corresponding  to  the  approximat¬ 
ed  SIRP  can  be  used  in  clutter  regions  to  obtain  detections  if  the  target  is  present, 
where  earlier  it  would  not  have  been  possible. 
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SCR 

LOD 

GR 

5  dB 

Pd 

K 

0.99 

3  dB 

Pd 

0.57 

*1  dB 

Pd 

0.45 

0.32 

0  dB 

Pd 

0.46 

0.18 

-3  dB 

Pd 

0.38 

0.08 

-6  dB 

Pd 

0.31 

0.06 

-8  dB 

Pd 

0.27 

0.04 

-9  dB 

Pd 

0.20 

0.033 

-10  dB 

Pd 

0.13 

0.026 

-11  dB 

Pd 

0.10 

0.019 

Table  4.3:  N=32,  Pp  =  10~2,  SCR:Signal  to  Clutter  Ratio,  LOD:Locally  Optimum  De¬ 
tector,  GR:  Gaussian  Receiver 


SCR 

LOD 

GR 

8  dB 

Pd 

0.09 

0.15 

7  dB 

Pd 

0.11 

0.08 

5  dB 

Pd 

0.14 

0.014 

2  dB 

Pd 

0.22 

0.008 

0  dB 

Pd 

0.26 

0.004 

-3  dB 

Pd 

0.19 

0.003 

-6  dB 

Pd 

0.14 

0.002 

-8  dB 

Pd 

0.11 

0.002 

-9  dB 

Pd 

0.08 

0.001 

Table  4.4:  N=32,  Pp  =  10~3,  SCR:Signal  to  Clutter  Ratio,  LOD:Locally  Optimum  De¬ 
tector,  GR:Gaussian  Receiver 


SCR 

LOD 

GR 

5  dB 

Pd 

0.44 

0.96 

3  dB 

Pd 

0.47 

0.68 

1  dB 

Pd 

0.53 

0.42 

0  dB 

Pd 

0.55 

0.30 

-3  dB 

Pd 

0.48 

0.17 

-6  dB 

Pd 

0.40 

0.10 

-8  dB 

Pd 

0.36 

0.03 

-10  dB 

Pd 

0.14 

0.023 

-12  dB 

Pd 

0.09 

0.012 

Table  4.5:  N=64,  Pp  =  10  2,  SCR:Signal  to  Clutter  Ratio,  LOD:Locally  Optimum  De¬ 
tector,  GR:Gaussian  Receiver 
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SCR 

LOD 

GR 

9  db 

Pd 

0.10 

0.17 

7  dB 

Pd 

0.15 

0.10 

5  dB 

Pd 

0.18 

0.02 

0  dB 

Pd 

0.36 

0.005 

-3  dB 

Pd 

0.28 

0.002 

-6  dB 

Pd 

0.22 

0.002 

-8  dB 

Pd 

0.18 

0.001 

-9  dB 

Pd 

0.10 

0.001 

Table  4.6:  N=64,  PF  =  10-3,  SCR:Signal  to  Clutter  Ratio,  LOD:Locally  Optimum  De¬ 
tector,  GR:Gaussian  Receiver 


SCR 

LOD 

GR 

5  dB 

Pd 

0.08 

0.0007 

0  dB 

Pd 

0.25 

0.0002 

-3  dB 

Pd 

0.19 

0.0002 

-6  dB 

Pd 

0.13 

0.0001 

-8  dB 

Pd 

0.11 

0.0001 

-9  dB 

Pd 

0.06 

0.0001 

Table  4.7:  N=64,  PF  =  10-4,  SCR:Signal  to  Clutter  Ratio,  LOD:Locally  Optimum  De¬ 
tector,  GR:Gaussian  Receiver 


4.2  The  Multivariate  K-Distribution 

In  the  previous  section  we  analyzed  the  performance  of  the  LOD  for  the  multi¬ 
variate  student-T  distribution.  The  multivariate  K-distribution  is  also  a  member  of 
the  class  of  joint  PDFs  arising  from  SIRPs.  Jakeman  [50]  has  shown  that  the  K- 
distributed  PDF  has  a  physical  interpretation  in  the  sense  that  it  arises  from  the 
random  walk  problem  where  the  number  of  steps  itself  is  random  having  a  negative 
binomial  distribution.  Also,  radar  clutter  has  been  empirically  shown  to  have  K- 
distributed  PDF.  In  this  chapter  we  analyze  the  performance  of  the  LOD  when  the 
background  disturbance  consisting  of  the  clutter  and  noise  can  be  approximated  as 
having  a  multivariate  K-distribution. 

Derivation  of  the  multivariate  K-distributed  PDF  from  the  representation  theo¬ 
rem  for  SIRPs  [28]  is  discussed  next.  Let  the  random  vector  2L  bave  a  multivariate 
Gaussian  distribution  with  zero  mean  and  covariance  matrix  M.  The  zero  mean 
assumption  will  not  affect  the  generality  of  the  results  that  follow.  The  joint  den¬ 
sity  function  on  the  elements  of  X_  is  given  by  equation  (4.1).  Consider  the  vector 


95 


SCR 

LOD 

GR 

5  dB 

Pd 

0.51 

0.998 

0  dB 

Pd 

0.65 

0.71 

-1  dB 

Pd 

0.63 

0.57 

-3  dB 

Pd 

0.57 

0.35 

-6  dB 

Pd 

0.51 

0.16 

-8  dB 

Pd 

0.46 

0.11 

-10  dB 

Pd 

0.22 

0.032 

-12  dB 

Pd 

0.15 

0.027 

-13  dB 

Pd 

0.11 

0.023 

-15  dB 

Pd 

0.09 

0.02 

Table  4.8:  N=128,  Pp  =  10”2,  SCR:Signal  to  Clutter  Ratio,  LODrLocally  Optimum 
Detector,  GR:Gaussian  Receiver 


SCR 

LOD 

GR 

5  dB 

Pd 

0.20 

0.30 

4  dB 

Pd 

0.23 

0.23 

3  dB 

Pd 

0.29 

0.19 

0  dB 

Pd 

0.48 

0.01 

-3  dB 

Pd 

0.36 

0.006 

-6  dB 

Pd 

0.30 

0.005 

-8  dB 

Pd 

0.26 

0.004 

-9  dB 

Pd 

0.14 

0.003 

-10  dB 

Pd 

0.09 

0.002 

Table  4.9:  N=128,  Pp  =  10“3,  SCR:Signal  to  Clutter  Ratio,  LOD:Locally  Optimum 
Detector,  GR: Gaussian  Receiver 


SCR 

LOD 

GR 

7  dB 

Pd 

0.08 

0.001 

5  dB 

Pd 

0.13 

0.0.0004 

2  db 

Pd 

0.30 

0.0004 

0  dB 

Pd 

0.37 

0.0003 

-3  dB 

Pd 

0.27 

0.0003 

-6  dB 

Pd 

0.21 

0.0002 

-8  dB 

Pd 

0.17 

0.0001 

-9  dB 

Pd 

0.09 

0.0001 

Table  4.10:  N=128,  Pp  =  10.  4,  SCR:Signal  to  Clutter  Ratio,  LOD:Locally  Optimum 
Detector,  GR:Gaussian  Receiver 
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W  =  v)L,  where  v  is  a  nonnegative  random  variable  statistically  independent  of  2L- 
Let  w? M~lw  be  once  again  denoted  by  the  variable  p.  Then,  the  conditional  density 
function  of  the  vector  W  given  v  can  be  written  as 

=  (4-20) 

The  unconditional  density  function  on  W  is  given  by  equation  (4.3),  where  f„{v) 
is  the  probability  density  function  of  the  random  variable  v.  Because  X_  and  v  are 
statistically  independent,  it  follows  that 

E(W)  =  E{yX)  =  E(2QE(i/)  =  Q  (4.21) 

E(WWT )  =  E{X  Xj)E{v2)  =  E{v2)M.  (4.22) 


As  is  the  case  for  the  student-T  distribution,  the  variance  of  the  elements  of  the  vector 
W  can  be  adjusted  by  appropriate  choice  of  E{v2).  Let  fu{v)  be  the  generalized  chi 
PDF  given  by  equation  (4.6).  E(i /2)  is  then  given  by 


E( S)  =  f 


.2/3-lp-ai/2  0 


or  ,  f°° 

— dv  =  /  2— 


oo  ^20+1  e-oti/2  a/3 


(4.23) 


Letting  av2  =  x  in  the  above  equation  we  get 


,  [^x°e-‘dx  T(fi+1)  l 

Jo  aT(8)  aT(8 )  a 


(4.24) 


If  we  let  a  =  then  the  generalized  chi  PDF  in  equation  (  4.6)  is  such  that  E(v2)  =  1 
irrespective  of  the  choice  for  the  parameter  /?.  Then  the  generalized  chi  PDF  takes 


the  form 


AM  =  fi  >  i. 


(4.25) 


In  general,  we  can  set  the  value  of  E(i/2)  to  a  desired  constant  C  by  choosing  o:  appro¬ 
priately.  Integrating  the  conditional  density  function  fw(w\v)  as  given  by  equation 
(4.20),  over  the  PDF  of  the  nonnegative  random  variable  v,  we  obtain  the  multivari¬ 
ate  K-distribution.  The  details  are  given  below.  Choosing  a  =  /?  in  equation  (  4.6) 
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we  can  write 


fw(m)  =  / 

Jo 


Jo  (27r)Ar|M|1/2 

=  SL 

(2ir)N\M\1/2T(P) 

Letting  pi/2  =  y  and  simplifying  we  get 


_2N  __j>  2i/2'J-1e-'3"V 

"  e  *  r(g)  * 

-  r  2v~2N^l3-1e-pv2-^du. 
I  Jo 


(4.26) 


/!t(3£)  “  /o°°  <4-27) 


From  page  183  of  Watson’s  book  on  Bessel  functions  [51],  we  have  the  result 


*>(*)  =  5(5  f  Jo  y  e~'e~(‘+'»)dy 


(4.28) 


provided  that  the  real  part  of  z 2  >  0.  Kp(z)  represents  the  modified  Bessel  function 
of  the  second  kind  of  order  p.  Combining  equations  (4.27)  and  (4.28)  results  in 


fw{w)  2(2Tr)N\M\1'2T(p)((2pp)1/2'1  PKN.0[{2Ppf/2] 

t:n\M\^2Y{P)p^ 


(4.29) 


The  above  expression  is  defined  to  be  the  27V-dimensional  multivariate  K-distribution 
with  parameters  N  and  p.  N  represents  the  number  of  complex  samples  and  p 
determines  the  tail  behavior  of  the  multivariate  density  function. 

For  simulation  purposes,  the  density  function  in  equation  (4.25)  can  be  simulated 
as  follows.  The  first  step  is  to  generate  a  standard  Gamma  variate  having  the  density 
function  fy(y)  =  y  W)  ■.  The  IMSL  package  is  used  to  generate  the  standard  Gamma 
variates.  The  next  step  is  to  divide  the  generated  random  variable  by  the  parameter 
P.  Let  X  =  Y/p.  The  density  function  of  X  is 


fx{x) 


p^xP-'e-P* 


(4.30) 


The  positive  square  root  of  K  results  in  a  variate  having  the  desired  density  function. 
Let  v  =  X%.  Therefore,  X  =  v2.  Introducing  the  Jacobian  of  the  transformation, 
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the  density  function  of  v  becomes 


/-(") 


m 


(4.31) 


which  is  identical  to  that  in  equation  (4.25).  This  random  variable  is  used  to  multiply 

the  Gaussian  vector  X  in  order  to  generate  the  K-distributed  vector  W. 

4.2.1  The  Locally  Optimum  Detector 

The  locally  optimum  detector  for  the  multivariate  K-distribution  can  now  be  de¬ 
rived.  From  equation  (2.32)  the  locally  optimum  detector  is  given  as 


dfp(r-e,) 

as 


1 0=0 


fdr) 


Hi 

<v- 

H0 


(4.32) 


Assuming  the  disturbance  can  be  modeled  by  a  multivariate  K-distribution,  /o(r) 
is  given  by  equation  (4.29),  with  the  variable  R  replacing  the  variable  W,  where 
p  =  rTM-1z\  Since  equation  (4.32)  is  a  ratio  test  and  all  constants  can  be  placed 
in  the  threshold  which  is  determined  by  specifying  a  false  alarm  probability,  all  mul¬ 
tiplicative  constants  are  ignored  for  convenience.  Hence,  we  will  be  concerned  only 
with  the  terms  containing  the  vector  R.  Excluding  the  constant  term  the  numerator 
in  the  ratio  test  is  given  by 


^l„,  =  ^Ip^^-4(2^)"]|.=o.  (4.33) 

Applying  the  chain  rule,  the  derivative  with  respect  to  6  can  be  expressed  as  the 
derivative  with  respect  to  p  times  the  derivative  of  p  with  respect  to  6.  From  equation 
4.16,  the  derivative  of  p  with  respect  to  6  at  6  =  0  is  given  by  —2 srM_1r.  Therefore, 
the  numerator  in  the  ratio  test,  excluding  the  constant,  is  given  by 


dfd—  ~  0i) 
dd 


=  -p^-'Kn-AWp)?) 

+  K'n-p  [(2/?p)  *  ] )  X  -2 irM_1r 


(4.34) 


where  K'N_p[x]  denotes  the  derivative  of  K^-p[x\  with  respect  to  x.  From  the  above 
equation,  the  sufficient  statistic  for  the  locally  optimum  detector  for  the  multivariate 
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K-distribution  can  be  written  as 


d  =  -/M- +  frM  (4.35) 

P  P  KN-0[(2pp)2] 

Equation  (4.35)  can  be  simplified  further  using  the  property  of  Bessel  functions. 
From  page  79  of  Watson’s  book  on  Bessel  functions  [51]  we  have  the  two  results: 


1.  K'v(z)  =  *Kv{z)-Kv+x(z).  (4.36) 

2.  K.v{z)  =  Ku{z).  (4.37) 


Use  of  equations  (4.36-4.37)  in  equation  (4.35)  and  combining  the  constant  term  with 
the  threshold  results  in 

p$KN_0[(2/3p)2] 

The  numerator  in  equation  (4.38)  once  again  has  a  factor  a  term  which  is  the  Gaussian 
linear  detector  or  the  matched  filter  that  maximizes  the  signal-to-disturbance  ratio 
whether  or  not  the  disturbance  is  Gaussian.  The  factor  multiplying  the  linear  detector 
is  the  optimum  nonlinearity  for  weak  signal  detection  when  the  disturbance  is  K- 
distributed.  The  nonlinearity  is  plotted  as  a  function  of  p  in  Fig.  4.5.  The  behavior 
of  the  nonlinearity  is  similar  in  form  to  the  one  obtained  for  the  student-T  distributed 
disturbance.  It  scales  down  large  values  of  p  and  enhances  small  values  of  p.  Since 
it  is  known  that  we  are  dealing  with  the  weak  signal  problem,  large  returns  cannot 
be  due  to  the  signal.  Consequently,  the  output  of  the  matched  filter  is  weighted  by 
a  small  number.  On  the  other  hand,  when  the  return  is  small,  there  is  a  greater 
chance  of  the  signal  being  detected,  if  present.  Hence,  the  nonlinearity  weights  it  by 
a  large  number.  The  role  of  the  matched  filter  is  also  similar  to  the  situation  when 

the  disturbance  was  modeled  as  having  a  student-T  distributed  disturbance. 

4.2.2  Computer  Simulation  of  Performance 

Because  analytical  closed  form  expressions  for  the  detection  and  false  alarm  prob¬ 
abilities  of  the  locally  optimum  detector  in  a  multivariate  K-distributed  clutter  are 
difficult  to  obtain,  performance  is  evaluated  through  computer  simulations  for  weak 
signal  applications.  For  simulation  purposes  a  multivariate  K-distributed  disturbance 
vector  jD  and  a  transmitted  signal  vector  £  have  to  be  generated.  The  covariance 
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matrix  of  the  clutter  process  is  assumed  known  with  unit  elements  along  the  diagonal. 
To  get  M,  the  covariance  matrix  of  the  disturbance,  we  add  a  small  number,  deter¬ 
mined  by  the  clutter  to  noise  ratio,  to  the  diagonal  elements  of  the  clutter  covariance 
matrix.  This  serves  to  limit  the  performance  of  the  receiver  even  where  the  clutter 
power  is  negligible.  In  this  simulation,  the  clutter  to  noise  ratio  is  taken  to  be  80  dB. 
The  simulation  procedure  for  the  disturbance  vector  is  outlined  as  follows: 

1.  Generate  a  27V-dimensional  white  Gaussian  random  vector  X' .  This  was  done 
by  using  the  DRRNOA  subroutine  from  the  IMSL  package. 

2.  Do  a  Cholesky  decomposition  on  the  matrix  M  to  get  M  =  KKT  where  A'  is  a 
lower  triangular  matrix. 

3.  The  vector  2L  =  K2L  is  the  multivariate  correlated  Gaussian  vector. 

4.  Generate  a  standard  Gamma  variate  Y.  This  was  done  by  using  the  subroutine 
DGAMDF  from  the^IMSL  package. 

5.  Obtain  v  =  (j)?.  The  random  variable  v  has  the  generalized  chi  PDF. 

6.  Obtain  the  multivariate  K-distributed  disturbance  vector  D_  with  the  desired 
correlation  properties  from  J2  =  vX_. 

The  block  diagram  of  the  simulation  procedure  is  shown  in  Fig.  4.6.  The  auto¬ 
correlation  of  the  clutter  process  is  taken  to  be  a  Gaussian  function  in  this  problem. 
Assuming  radar  returns  from  clutter  cells  to  be  highly  correlated,  as  is  the  case  with 
ground  clutter,  the  sample  to  sample  time  correlation  is  taken  as  0.999  in  this  problem. 
Specifically,  the  sample  autocorrelation  function  is  chosen  as 

Rcc(n)  =  exp(-0.000801n2)  n  =  0, 1, N  -  1  (4.39) 

where  Rcc(n)  is  the  discrete  time  autocorrelation  function  of  the  clutter  process. 
The  autocorrelation  and  the  power  spectral  density  of  the  clutter  process  are  shown 
in  Figures  4.7  and  4.8  respectively.  Once  again,  we  see  that  the  spectral  spread  of  the 
clutter  is  very  small.  Using  the  above  function  the  elements  of  the  covariance  matrix 
of  the  disturbance  can  be  filled  appropriately.  The  elements  of  the  signal  vector  are 
chosen  such  that  the  nth  element  Sn  =  ej2,r/c(n_1):r,  n  =  1, 2, ...,  N.  fD  represents  the 
Doppler  frequency  shift  of  the  received  signal  and  T  represents  the  time  separation 
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between  sampling  instants. 

The  detector  in  equation  (4.18)  is  now  simulated.  As  /?  — ►  oo,  the  K-distribution 
tends  to  the  Gaussian  distribution.  As  0  — »  0.5,  the  K-distribution  deviates  from 
that  of  the  Gaussian  in  the  sense  of  having  very  large  tails.  Four  different  values  of 
/3=0. 5,  1.0,  1.5  and  2.0  were  used  for  performance  evaluation. 

The  thresholds  corresponding  to  false  alarm  probabilities  10-*;  k  =  1,2, 3, 4  are 
obtained  through  the  method  of  extreme  value  theory  explained  in  Chapter  3.  Once 
the  threshold  is  set  the  detection  probabilities  are  obtained  by  simulating  the  LOD  for 
received  vectors  consisting  of  the  sum  of  the  signal  and  disturbance  vectors  for  various 
signal  to  disturbance  ratios.  The  value  of  fo  is  chosen  to  be  zero  in  this  simulation,  a 
worst  case  situation.  The  number  of  trials  in  the  Monte  Carlo  simulation  for  obtaining 
detection  probabilities  is  equal  to  10,000.  The  performance  of  the  LOD  is  compared 
to  that  of  the  Gaussian  detector  for  the  multivariate  K-distributed  clutter.  The  test 
statistic  for  the  Gaussian  detector  is  given  by  srM-1r.  Simulations  could  not  be 
carried  out  for  sample  sizes  larger  than  128  because  of  the  behavior  of  the  modified 
Bessel  functions.  The  modified  Bessel  functions  are  highly  nonlinear  and  numerical 
difficulties  arise  in  the  evaluation  of  the  Bessel  functions  for  either  small  arguments 
and  large  orders  or  large  arguments  and  small  orders.  For  the  modified  Bessel  function 
of  the  second  kind  K„(x),  v  must  not  be  so  large  compared  to  x  such  that 

*«(*)- 5 wr  (4'40) 

overflows.  With  reference  to  equation  (4.38)  it  is  noticed  that  for  v  —  0  —  N  >  128, 
the  value  x  =  2(2/3p)s  frequently  is  small  enough  to  result  in  overflow.  However, 
we  notice  that  in  equation  (4.38),  the  test  statistic  has  a  ratio  of  modified  Bessel 
functions  with  the  order  differing  by  one.  By  using  the  small  argument  approximation 
given  in  equation  (4.40),  the  overflow  problem  for  small  arguments  and  large  orders 
can  be  overcome.  On  the  other  hand,  when  the  argument  is  large  and  the  order 
is  small  underflow  problems  result.  IMSL  uses  an  iterative  scheme  to  generate  the 
modified  Bessel  functions  of  the  second  kind,  starting  from  lower  orders  and  building 
upto  higher  orders.  The  lower  order  Bessel  functions  needed  to  generate  the  higher 
order  Bessel  functions  overflow  for  sufficiently  large  values  of  the  argument.  Because 
of  this  numerical  difficulty  analysis  of  performance  is  restricted  to  sample  sizes  that 
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Figure  4.9:  Power  spectral  density  of  the  clutter  process 
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Figure  4.10:  Log  power  spectral  density  of  the  clutter  and  signal  processes 


107 


are  smaller  than  128. 

4.2.3  Conclusions 

When  the  sample  size  was  less  than  128,  the  probability  of  detection  for  the  LOD 
was  always  less  than  0.1  for  false  alarm  probabilities  less  than  or  equal  to  10-2.  Con¬ 
sequently,  in  this  section  results  are  tabulated  for  N=128.  As  mentioned  previously, 
numerical  difficulties  made  it  impossible  to  determine  performance  for  sample  sizes 
greater  than  128.  Results  are  presented  in  Tables  4.11-4.14  for  values  of  the  shape 
parameter  equal  to  0.5,  1.0,  1.5  and  2.0  when  N=128  and  Pp  =  10~2.  Recall  that 
/3  =  0.5  corresponds  to  a  very  large  tail  while  /3  =  2.0  results  in  a  distribution  that 
begins  to  approximate  the  Gaussian  tail.  Note  that  Pd  for  the  LOD  is  relatively 
constant  when  -10  dB<SCR<0  dB.  The  best  performance  is  achieved  for  /?  =  0.5 
where  Pd  peaks  at  a  value  of  0.23  and  the  LOD  performs  signicantly  better  than  the 
Gaussian  receiver  over  a  dynamic  range  of  20  dB  extending  from  SCR  equal  to  0  dB 
to  -20  dB.  For  /?  =  1.0  and  1.5,  Pd  peaks  around  0.18  and  0.19  with  a  useful  dynamic 
range  of  approximately  11  dB  extending  over  the  range  -17  dB<SCR<-7  dB.  Finally, 
for  /3  =  2.0  the  peak  value  of  the  LOD  is  0.16  and  the  LOD  performs  better  than 
the  Gaussian  receiver  over  a  7  dB  dynamic  range  extending  extending  between  -9  dB 
and  -15  dB.  The  LOD  for  the  K-distributed  disturbance  does  not  peak  at  values  of 
Pd  as  large  as  those  for  the  student-T  distribution.  This  result  agrees  with  that  of 
Spaulding  [18]  who  shows  that  we  cannot  arbitrarily  say,  by  inspection,  that  a  noise 
process  that  is  “tremendously”  non-Gaussian  (i.e.  the  noise  distribution  has  a  very 
large  tail)  can  result  in  “tremendous”  improvement  over  the  corresponding  Gaussian 
or  linear  receiver  situation.  This  behavior  is  explained  by  once  again  examining  the 
role  of  the  nonlinearity  and  the  matched  filter  in  determining  the  test  statistic.  As 
was  the  case  with  the  student-T  distribution,  the  nonlinearity  maps  large  values  into 
small  values  and  vice  versa.  However,  when  =  0.5,  the  tail  of  the  K-distribution 
is  much  heavier  than  any  of  those  encountered  with  the  student-T  distribution.  In 
addition  to  more  large  values  of  the  received  signal  due  to  the  clutter,  there  are  also 
more  small  values.  This  prevents  the  threshold  for  the  K-distribution  from  being 
lowered  as  much  as  it  was  for  the  student-T  distribution.  As  a  result,  if  a  detection 
is  to  occur,  a  larger  boost  must  be  generated  by  the  matched  filter  when  a  signal 
is  present.  Unfortunately,  with  reference  to  equation  (4.38),  the  increase  due  to  the 
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SCR 

LOD 

GR 

5  dB 

Pd 

0.13 

0.99 

1  dB 

Pd 

0.21 

0.25 

0  dB 

Pd 

0.21 

0.16 

-3  dB 

Pd 

0.22 

0.12 

-6  dB 

Pd 

0.22 

0.09 

-8  dB 

Pd 

0.23 

0.07 

-10  dB 

Pd 

0.23 

0.05 

-12  dB 

Pd 

0.20 

0.03 

-14  dB 

Pd 

0.17 

0.02 

-15  dB 

Pd 

0.15 

0.01 

-18  dB 

Pd 

0.12 

0.01 

-20  dB 

Pd 

0.08 

0.01 

Table  4.11:  N=128,  Pf  =  ltf-2,  0  =  0.5  SCR:Signal  to  Clutter  Ratio,  LOD:Locally 
Optimum  Detector,  GR:Gaussian  Receiver 


matched  filter  under  hypothesis  Hi  does  not  dominate  the  decrease  due  to  the  non¬ 
linearity.  Consequently,  not  as  many  detections  result  as  occurred  with  the  student-T 
distribution.  Finally,  it  is  pointed  out  that  the  detection  probability  was  much  less 
than  0.1  for  all  values  of  the  shape  parameter  with  N=128  and  Pp  =  10-3. 

It  should  be  noted  that  dramatically  improved  performance  might  occur  when  the 
sample  size  is  greater  than  128.  Because  the  LOD  is  nonlinear,  a  threshold  effect 
exists.  It  is  not  clear  that  N=128  is  a  sufficiently  large  sample  size  to  get  over  the 
threshold.  On  the  other  hand,  larger  sample  sizes  cause  both  numerical  difficulties 
and  may  not  be  achievable  in  an  actual  application. 

In  the  next  chapter  we  come  up  with  an  alternative  scheme  and  derive  a  detector 
with  enhanced  performance  under  weak  signal  conditions.  This  detector  is  termed  the 
amplitude  dependent  locally  optimum  detector.  This  detector  is  not  uniformly  most 
powerful.  Thus,  the  thresholds  for  obtaining  the  desired  false  alarm  probabilities  and 
the  detection  probabilities  are  functions  of  the  signal  to  clutter  ratio,  6.  It  is  shown 
that  this  detector  offers  a  significant  improvement  in  performance  for  smaller  sample 
sizes  compared  to  the  LOD  obtained  on  the  basis  of  the  uniformly  most  powerful  test. 

4.3  Determining  Locally  Optimum  Detector  Threshold  with 
Real  Data 

Since  the  locally  optimum  detectors  corresponding  to  SIRP  multivariate  density 
functions  are  nonlinear,  it  is  not  possible  to  evaluate  the  thresholds  corresponding  to 
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SCR 

LOD 

GR 

5  dB 

Pd 

0.10 

1.0 

0  dB 

Pd 

0.17 

0.35 

-3  dB 

Pd 

0.17 

0.33 

-6  dB 

Pd 

0.17 

0.19 

-7  dB 

Pd 

0.18 

0.15 

-8  dB 

Pd 

0.18 

0.13 

-10  dB 

Pd 

0.18 

0.04 

-11  dB 

Pd 

0.16 

0.04 

-13  dB 

Pd 

0.14 

0.04 

-15  dB 

Pd 

0.12 

0.03 

-17  dB 

Pd 

0.10 

0.03 

-20  dB 

Pd 

0.08 

0.02 

Table  4.12:  N=128,  Pp  —  10  2,  /?  =  1.0  SCR:Signal  to  Clutter  Ratio,  LOD:Locally 
Optimum  Detector,  GR:Gaussian  Receiver 


SCR 

LOD 

GR 

.5  dB 

Pd 

0.11 

1.0 

0  dB 

Pd 

0.18 

0.40 

-3  dB 

Pd 

0.18 

0.37 

-6  dB 

Pd 

0.19 

0.24 

-7  dB 

Pd 

0.19 

0.19 

-8  dB 

Pd 

0.19 

0.17 

-10  dB 

Pd 

0.19 

0.03 

-12  dB 

Pd 

0.16 

0.03 

-15  dB 

Pd 

0.12 

0.026 

-17  dB 

Pd 

0.09 

0.02 

Table  4.13:  N=128,  Pp  =  10  2,  j3  =  1.5  SCR:Signal  to  Clutter  Ratio,  LOD:Locally 
Optimum  Detector,  GR:Gaussian  Receiver 


SCR 

LOD 

GR 

5  dB 

Pd 

0.06 

0.99 

0  dB 

Pd 

0.14 

0.44 

-3  dB 

Pd 

0.14 

0.41 

-6  dB 

Pd 

0.15 

0.25 

-8  dB 

Pd 

0.15 

0.18 

*-9  dB 

Pd 

0.16 

0.14 

-10  dB 

Pd 

0.16 

0.08 

-12  dB 

Pd 

0.14 

0.05 

-14  dB 

Pd 

0.11 

0.03 

-15  dB 

Pd 

0.09 

0.02 

Table  4.14:  N=128,  Pp  =  10  2,  (3  =  2.0  SCR:Signal  to  Clutter  Ratio,  LOD:Locally 
Optimum  Detector,  GR:Gaussian  Receiver 
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different  false  alarm  probabilities  in  closed  form.  However,  it  is  shown  in  Chapter  3 
that  by  using  extreme  value  theory  we  can  model  the  tail  behavior  of  the  test  statistic 
using  empirically  available  outputs  of  the  LOD.  It  is  also  shown  that  this  technique 
yields  fairly  accurate  thresholds  for  sample  sizes  that  are  orders  of  magnitude  smaller 
than  those  required  for  Monte  Carlo  simulation.  In  practice,  the  cell  false  alarm 
probability  is  determined  by  specifying  the  number  of  false  alarms  allowed  per  scan 
of  the  surveillance  volume.  For  example,  if  one  false  alarm  per  scan  is  specified  and 
the  surveillance  volume  consists  of  one  million  resolution  cells,  then  the  false  alarm 
probability  for  each  cell  is1  set  at  10-6.  The  weak  signal  detector  should  be  applied 
only  to  those  cells  for  which  a  weak  signal  problem  exists,  as  discussed  in  Section 
1.3.  The  number  of  such  cells  is  typically  a  small  fraction  of  the  total  number  of  cells 
in  the  surveillance  volume.  In  addition,  because  of  the  difficulty  in  detecting  weak 
signals,  one  should  allow  for  a  few  more  false  alarms  than  is  the  case  for  strong  signal 
detection.  For  example,  assume  a  surveillance  volume  is  comprised  of  one  million 
resolution  cells  and  that  1  percent  of  these  cells  can  be  classified  as  “weak  signal”. 
Allowing  for  one  false  alarm  per  scan  due  to  the  “weak  signal”  cells,  the  false  alarm 
probability  for  each  of  these  cells  would  be  10-4.  If  reports  from  the  tracker  are  used 
to  sort  out  false  alarms,  even  larger  false  alarm  probabilities,  such  as  10-3  might  be 
acceptable.  The  false  alarm  probabilities  can  be  made  even  larger  as  the  number  of 
“weak  signal”  cells  become  smaller. 

In  practice,  reference  cells  centered  around  the  test  cell  are  used  to  determine  the 
threshold  for  the  test  celL  The  problem  is  complicated  by  the  fact  that  reference 
cells  too  far  removed  from  the  test  cell  may  not  be  representative  due  to  nonhomo¬ 
geneities  in  the  clutter.  As  a  result,  the  number  of  representative  reference  cells  is 
limited.  In  fact,  there  may  not  be  available  the  number  of  representative  reference 
cells  required  by  the  extreme  value  theory.  This  poses  a  practical  problem  in  terms 
of  implementation  of  the  LOD. 

Fortunately,  the  behavior  of  the  LOD  is  such  that  this  problem  can  be  overcome.  As 
seen  from  Figures  4.1  and  4.5,  the  nonlinearity  is  such  that,  it  transforms  large  values 
of  the  received  observations  (corresponding  to  large  values  of  the  quadratic  form  p ) 
to  small  values  and  small  values  of  the  received  observations  (corresponding  to  small 
values  of  the  quadratic  form  p )  to  large  values.  This  implies  that  the  tail  behavior  of 
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the  LOD  test  statistic  is  governed  by  the  body  of  the  multivariate  SIRP  disturbance 
PDF.  This  is  due  to  the  fact  the  tail  region  of  the  test  statistic  corresponds  to  large 
values  of  the  test  statistic.  Due  to  the  nonlinearity,  these  in  turn,  arise  from  small 
values  of  the  quadratic  form  p,  which  correspond  to  the  body  of  the  disturbance  PDF. 

The  locally  optimum  detector  based  on  the  K-distributed  disturbance  was  simulat¬ 
ed  to  see  if,  indeed,  the  tail  of  the  test  statistic  corresponded  to  disturbance  values  that 
arise  from  the  body  of  the  K-distributed  disturbance.  Since  the  disturbance  observa¬ 
tion  space  is  an  ^-dimensional  space,  corresponding  to  N  points  in  the  observation 
vector,  the  body  and  tail  of  the  disturbance  is  defined  through  the  corresponding 
quadratic  form  p.  If  the  observations  are  uncorrelated,  then  p  corresponds  to  a  hy¬ 
persphere  in  TV-dimensional  space.  The  body  of  the  disturbance  density  function, 
then  can  be  defined  as  those  points  that  lie  within  the  sphere  and  the  tail  as  those 
points  that  lie  outside  the  sphere.  For  a  SIRP  disturbance  the  univariate  density 
function  of  p  can  be  evaluated  in  closed  form.  This  enables  us  to  evaluate  the  point 
p  =  p0  such  that  /0Po  fp{p)dp  =  <5,  where  fp(p)  is  the  probability  density  function  of 
the  quadratic  form  p  and  S  is  a  number  between  0  and  1.  By  choosing  6  say  equal 
to  0.98,  we  define  the  point  po.  Then  we  classify  the  simulated  disturbance  vectors 
as  follows:  If  p  >  p0,  the  vector  arises  from  the  tail  of  the  disturbance  PDF.  On  the 
other  hand,  if  p  <  p0,  the  vector  arises  from  the  body  of  the  disturbance  PDF.  If  the 
vector  is  correlated  the  treatment  is  similar.  In  this  case,  p  corresponds  to  rTM~1r, 
where  M  is  the  covariance  matrix  of  the  disturbance.  A  constant  value  of  p  then 
corresponds  to  an  ellipse  in  a  N-dimensional  space.  A  value  for  po,  can  be  defined  in 
the  same  way  as  was  done  for  the  uncorrelated  vector  situation.  When  the  vector  is 
uncorrelated  note  that  M  is  the  identity  matrix. 

To  verify  this  idea  simulations  were  carried  out  using  the  same  covariance  matrix 
as  in  section  4.2.2.  The  value  of  8  =  0.98  was  chosen.  From  the  analytical  expression 
for  the  PDF  of  p  corresponding  to  the  K-distributed  SIRP,  the  value  of  po  was  found 
to  be  equal  to  2.15.  Hence,  observation  vectors  that  resulted  in  values  of  p  >  2.15 
were  assigned  to  to  the  tail  of  the  disturbance  PDF  and  those  for  which  p  <  2.15  were 
assigned  to  the  body  of  the  disturbance  PDF.  We  now  turn  our  attention  to  the  test 
statistic.  With  respect  to  the  PDF  of  the  test  statistic,  the  tail  region  is  defined  as  the 
region  corresponding  to  an  area  of  0.02  in  the  tail.  From  the  simulations  it  was  seen 
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that  the  small  values  of  p  mapped  on  to  either  the  left  tail  (i.e  negative)  or  the  right 
tail  (i.e.  positive)  of  the  test  statistic  PDF.  Note  that  the  output  of  the  nonlinearity  is 
nonnegative  while  the  output  of  the  matched  filter  can  be  either  positive  or  negative. 
Consequently,  the  sign  of  the  matched  filter  output  determines  whether  small  values 
of  p  map  into  the  left  or  right  tail.  The  PDF  of  the  test  statistic  under  the  null 
hypothesis  is  symmetric  with  respect  to  the  origin.  Therefore,  the  right  and  the  left 
tail  were  each  assigned  an*equal  area  of  0.01.  The  threshold  estimated  using  10,000 
computer  simulations  for  a  false  alarm  probability  of  0.01  was  131.22.  Thus,  the 
right  tail  of  the  test  statistic  is  defined  to  be  those  values  exceeding  131.22.  10,000 
random  vectors  of  dimension  N  =  16  were  then  simulated  from  the  K-distributed 
disturbance.  The  value  of  the  quadratic  form  and  the  test  statistic  were  evaluated 
corresponding  to  simulated  disturbance  vectors.  The  test  statistic  was  then  sorted  in 
ascending  order.  The  test  statistic  values  that  exceeded  131.22  and  the  corresponding 
quadratic  form  values  were  noted.  It  was  found  that  the  values  of  p  corresponding  to 
the  test  statistic  values  greater  than  131.22  were  all  less  than  1.72.  From  the  data  it 
is  also  seen  that  the  smaller  the  value  of  the  quadratic  form,  the  higher  is  the  value 
of  the  test  statistic.  This  confirms  that  the  tail  values  of  the  test  statistic  arise  from 
the  body  of  the  disturbance  PDF. 

To  utilize  the  Ozturk  algorithm  discussed  in  Section  3.1.1  it  is  required  that  ap¬ 
proximately  100  of  the  neighboring  cells  be  representative  of  the  test  cell.  Ozturk’s 
algorithm  can  then  be  used  to  accurately  approximate  the  body  of  the  multivariate 
PDF.  Once  the  body  of  the  disturbance  PDF  is  approximated  accurately,  it  can  be 
employed  to  generate  the  much  larger  sample  size  required  to  estimate  the  thresh¬ 
olds  of  the  test  statistic  through  extreme  value  theory.  Good  results  are  expected  for 
the  threshold  estimation  when  the  body  of  the  disturbance  PDF  has  been  accurately 
approximated  by  the  Ozturk  algorithm  because  the  tail  of  the  test  statistic  is  caused 
by  the  body  of  the  disturbance  PDF.  Thus,  we  see  that  the  number  of  reference  cells 
required  for  threshold  estimation  is  approximately  100  although  we  still  have  to  gen¬ 
erate  from  the  approximated  disturbance  PDF  much  larger  sample  sizes  off  line  in 
order  to  make  use  of  the  extereme  value  theory. 

When  the  disturbance  random  variables  arise  from  the  student-T  distributed  dis¬ 
turbance,  a  similar  nonlinear  mapping  is  seen  whereby  the  tail  of  the  test  statistic 
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is  caused  by  the  body  of  the  disturbance  PDF  and  vice  versa.  A  procedure  similar 
to  the  one  followed  for  the  K-distributed  disturbance  case  can  also  be  used  to  reduce 
the  number  of  reference  cells.  Such  a  reduction  in  the  number  of  required  reference 
cells  makes  the  implementation  of  these  locally  optimum  detectors  practical. 
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Chapter  5 

Performance  of  the  Amplitude 
Dependent  Locally  Optimum 
Detector 


In  Chapter  4  we  analyzed  the  performance  of  the  locally  optimum  detector  when  the 
underlying  disturbance  was  modeled  by  the  student-T  and  K-distributions,  respec¬ 
tively.  It  was  seen  that  the  performance  of  the  LOD  was  not  as  good  as  desired  for  the 
case  in  which  the  disturbance  is  approximated  as  K-distributed.  In  this  chapter  we 
come  up  with  an  alternate  form  of  the  the  locally  optimum  detector  which  takes  into 
account  the  amplitude  of  the  weak  signal.  Such  a  detector  is  no  longer  uniformly  most 
powerful  in  the  sense  that  the  thresholds  for  different  false  alarm  probabilities  and 
detection  probabilities  is  a  function  of  the  signal  to  clutter  ratio.  We  will  then  com¬ 
pare  the  performance  of  the  amplitude  dependent  locally  optimum  detector  (ALOD) 
with  that  of  the  LOD  for  the  K-distributed  and  student-T  disturbance  models. 

The  uniformly  most  powerful  test  for  a  deterministic  signal  utilizes  the  ratio  of  the 
derivative  with  respect  to  the  signal  strength  of  the  Nth  order  joint  PDF  under  H\ 
to  the  Nth  order  joint  PDF  under  Ho-  The  limit  of  this  ratio  as  the  signal  strength 
tends  to  zero  is  evaluated  to  obtain  the  test  statistic  for  the  decision  rule.  The 
amplitude  dependent  locally  optimum  detector  also  utilizes  the  ratio  of  the  derivative 
with  respect  to  the  signal  strength  of  the  Nth  order  joint  PDF  under  Hi  to  the  Nth 
order  joint  PDF  under  Ho.  However,  for  the  ALOD  we  do  not  evaluate  this  ratio  as 
the  signal  strength  tends  to  zero  but  leave  it  as  a  function  of  6,  the  square  root  of 
the  signal  to  clutter  ratio.  Note  that  6  also  corresponds  to  the  signal  strength  since, 


115 


as  explained  in  Chapter  2,  the  variance  of  the  clutter  is  taken  to  be  unity.  Because 
6  is  unknown  a  priori,  a  bank  of  receivers  tuned  to  different  values  of  6  must  be 
implemented.  Such  an  approach  is  analogous  to  that  of  a  Doppler  filter  bank  used 
in  Range-Doppler  processing  of  radar  signals.  Instead  of  having  a  a  bank  of  Doppler 
filters  the  ALOD  employs  a  bank  of  amplitude  filters.  The  output  of  each  filter  will 
be  maximized  when  the  signal  amplitude  is  matched  to  the  amplitude  for  which  the 
filter  is  designed. 

5.1  The  Amplitude  Dependent  Locally  Optimum  Detector 
for  the  Multivariate  K-Distributed  Disturbance 

The  amplitude  dependent  locally  optimum  detector  for  a  given  value  of  the  signal 
to  clutter  ratio,  0,  is  defined  to  be 


de 

.feW 


(5.1) 


Assuming  the  disturbance  can  be  modeled  by  a  multivariate  K-distribution,  /d(l) 
under  H0  is  given  by  equation  (4.29)  where  p  =  rTM-1r.  On  the  other  hand,  under 
Hi  the  quadratic  form  in  fD_(r  —  6s)  is  pg  =  ( r  —  6s)TM~l(r  —  6s )  where  the  subscript 
6  is  used  to  emphasize  that  we  are  not  taking  the  limit  as  6  approaches  zero.  Since 
equation  (5.1)  is  a  ratio  test,  all  constants  can  be  placed  in  the  threshold  which 
is  determined  by  specifying  a  false  alarm  probability.  Therefore,  the  multiplicative 
constants  are  ignored  for  convenience.  Hence,  we  will  be  concerned  only  with  the 
terms  containing  the  vector  R.  Excluding  constants  the  numerator  in  the  ratio  test 
is  given  by 

-- ea)  =  (5.2) 

Applying  the  chain  rule,  the  derivative  with  respect  to  6  can  be  expressed  as  the 
derivative  with  respect  to  pg  times  the  derivative  of  pg  with  respect  to  6.  The  deriva¬ 
tive  of  pg  with  respect  to  6  can  be  derived  as 


^  =  ( J^(r  -  Os)tM  1(r  —  6s))  =  —2stM  1(r  -  6s).  (5.3) 
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Therefore,  differentiating  the  right  hand  side  of  equation  (5.2)  with  respect  to  0  and 
excluding  the  constant,  we  get 


dfoiL  ~  Os) 

80 


=  g — ( Pe )t*~  1Kn-p[(2  Ppg)*} 


(5.4) 


where  lg  =  2sTM~l(r  —  0s),  the  subscript  0  on  /  is  again  used  to  emphasize  that 
we  are  not  taking  the  limit  as  0  approaches  zero  and  the  prime  denotes  the  differen¬ 
tiation  operation  with  respect  to  the  argument  of  the  function.  Equation  (5.4)  can 
be  simplified  further  by  using  equations  (4.36-4.37)  The  amplitude  dependent  locally 
optimum  test  can  then  be*written  as 


Talod(l)  = 


(p*)iV  Kn-AWp)*) 


(5.5) 


Notice  that  the  Bessel  function  in  the  numerator  is  one  order  higher  than  the  Bessel 
function  in  the  denominator.  This  is  significant.  In  the  likelihood  ratio  test,  the 
orders  of  the  Bessel  functions  in  the  numerator  and  denominator  are  identical.  This 
order  difference  between  the  Bessel  function  in  the  numerator  and  denominator  of 
the  ALOD  makes  the  ALOD  more  sensitive  to  small  perturbations  than  the  classical 
form  of  the  Neyman- Pearson  test. 

The  computer  simulation  of  performance  was  carried  out  in  the  same  manner  as 
described  in  section  4.2.2.  The  values  of  the  shape  parameter  /3  chosen  in  this  sim¬ 
ulation  are  0.5,  1.0,  1.5  and  3.0.  With  reference  to  equation  (5.5),  it  is  noticed  for 
v  =  N  —  /3  >  16  that  the  value  x  =  (2flp)?  frequently  assumes  small  enough  values 
to  result  in  overflow.  Also,  we  notice  in  equation  (5.5)  that  the  test  statistic  has  a 
ratio  of  modified  Bessel  functions  with  the  order  differing  by  one.  Since  the  Bessel 
functions  in  the  numerator  and  the  denominator  of  the  test  statistic  differ  in  both 
the  argument  and  the  order,  the  small  argument  approximation  given  in  equation 
(4.40)  cannot  be  used.  Because  of  this  numerical  difficulty  analysis  of  performance  is 

restricted  to  sample  size  equal  to  16. 

5.1.1  Results  of  Computer  Simulation 

The  results  of  the  computer  simulation  are  shown  in  Tables  5. 1-5.4  for  16  complex 
samples  and  /?=0.5,  1.0,  1.5  and  3.0,  respectively.  Performance  is  evaluated  only  for 
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values  of  SCR  less  than  or  equal  to  0  dB  because  we  are  interested  in  the  problem  of 
weak  signal  detection.  From  the  tables  we  observe  that  the  best  performance  of  the 
ALOD  is  obtained  for  0  =  0.5.  For  this  value  of  0  the  K-distribution  has  a  maximum 
deviation  from  the  Gaussian  in  the  sense  of  having  a  large  tail.  When  0  =  3.0,  the 
tail  of  the  K-distribution  closely  approximates  that  of  the  Gaussian. 

It  is  seen  from  the  tables  that  the  ALOD  for  the  K-distribution  significantly  out¬ 
performs  both  the  Gaussian  receiver  as  well  as  the  LOD.  The  performance  of  the  LOD 
for  K-distributed  disturbance  is  tabulated  in  chapter  4,  for  N=128.  Some  of  those 
results  are  shown  again  in  Tables  5. 1-5.4.  Table  5.1  shows  the  detection  probabili¬ 
ties  obtained  for  different  false  alarm  probabilities  and  signal  to  clutter  ratios  when 
0  =  0.5.  For  Pf  =  10-2  the  ALOD  significantly  outperforms  the  Gaussian  receiver 
over  a  30  dB  dynamic  range  extending  from  0  dB  to  -30  dB.  The  peak  value  of  Pd 
equals  0.5.  A  similar  result  to  that  obtained  for  the  performance  of  the  LOD  with  a 
student-T  distributed  disturbance  is  observed  here.  The  amplitude  dependent  locally 
optimum  detector  outperforms  the  Gaussian  receiver  by  one  order  of  magnitude  for 
PF  =  10-2,  two  orders  of  magnitude  for  Pf  =  10-3,  three  orders  of  magnitude  for 
Pf  =  10-4  and  four  orders  of  magnitude  for  Pp  =  10-5.  The  useful  dynamic  range 
of  the  ALOD  is  smaller  for  decreasing  values  of  Pp.  The  useful  dynamic  range  is 
about  30  dB  for  false  alarm  probabilities  equal  to  10-2  and  10-3,  25  dB  for  false 
alarm  probability  equal  to  10-4  and  20  dB  for  a  false  alarm  probability  equal  to  10-5. 
Recall  that  the  detection  probabilities  resulting  from  the  LOD  for  the  K-distributed 
disturbance  were  negligibly  small  when  Pf  was  lower  than  10-2.  The  ALOD,  on  the 
other  hand,  yields  significant  values  of  Pd  for  Pf  as  low  as  10-5. 

As  the  value  of  0  is  increased  it  is  noticed  from  the  tables  that  the  peak  value 
of  Pd  as  well  as  the  dynamic  range  of  the  receiver  decreases.  This  arises  because 
the  tail  of  the  K-distribution  becomes  closer  to  that  of  the  Gaussian.  In  fact,  when 
0  =  3.0,  the  detection  probabilities  obtained  with  the  ALOD  for  Pp  =  10~4  and 
10-5  are  negligible.  Also,  the  useful  dynamic  range  of  the  ALOD  is  about  15  dB  for 
Pp  =  10-2  and  less  than  10  dB  for  Pp  =  10-3.  It  is  interesting  to  note  that  the 
Gaussian  receiver  shows  an  improvement  in  performance  for  increasing  values  of  0. 
The  improvement,  however,  is  only  marginal. 

The  behavior  of  the  amplitude  dependent  locally  optimum  detector  can  be  un- 
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derstood  as  follows.  With  reference  to  equation  (5.5),  the  ALOD  can  be  factored 
into  three  terms.  The  first  is  lg.  The  second  is  — an<^  third  term  is 

The  term  lg  behaves  identically  to  that  of  the  Gaussian  receiver.  It 

KN-0[(2  0p)7] 

yields  a  higher  value  when  the  signal  is  present  than  when  it  is  absent  case.  The  ran¬ 
dom  variable  p$,  in  general,  tends  to  assume  a  lower  value  whether  or  not  the  signal 
is  present,  compared  to  the  term  p.  Since  is  negative  with  magnitude  greater 
than  one,  the  second  term  in  the  ALOD  assumes  a  value  greater  than  one  whether 
or  not  the  signal  is  present.  However,  from  the  simulations,  it  is  seen  in  most  cases 
that  the  second  factor  in  the  ALOD  assumes  a  larger  value  when  the  desired  signal  is 
presen  than  when  it  is  not  present.  The  modified  Bessel  function  of  the  second  kind, 
being  a  monotonically  decreasing  function  assumes  large  values  when  the  argument 
is  small  and  small  values  when  the  argument  is  large.  Since  pg  <  p  in  general,  the 
third  term  in  the  ALOD  also  has  a  value  greater  than  one  whether  or  not  the  signal 
is  present.  Once  again,  from  the  computer  simulations,  it  is  observed  that  the  third 
term  has  a  higher  value  in"  most  cases  when  the  desired  signal  is  present.  Thus,  the 
amplitude  dependent  locally  optimum  detector  for  the  K-distributed  disturbance  has 
three  factors  where  each  of  the  three  factors,  in  general,  assumes  larger  values  under 
hypothesis  Hi  than  under  hypothesis  Ho.  This  contributes  to  the  increased  sensi¬ 
tivity  of  the  ALOD  to  weak  signals  for  the  K-distributed  disturbance  to  resulting  in 
dramatically  improved  performance  over  that  of  the  LOD  with  a  much  larger  sample 
size.  Recall  that  the  LOD  has  two  factors,  one  of  which  increases  when  the  desired 
signal  is  present  and  the  other  of  which  decreases.  The  increase  in  the  value  of  the 
test  statistic  of  the  ALOD  when  the  signal  is  present  is  not  restricted  to  weak  signal 
situations.  It  will  also  hold  true  for  strong  signal  situations.  However,  when  there  is 
a  strong  signal  situation,  the  likelihood  ratio  test  is  the  optimal  test  and  the  ALOD 
should  not  be  used. 

5.2  The  Amplitude  Dependent  Locally  Optimum  Detector 
for  the  Student-T  Distributed  Disturbance 

Assuming  the  disturbance  can  be  modeled  by  a  multivariate  student-T  distribution, 
fD(r)  under  H0  is  given  by  equation  (4.11)  where  p  =  rrM-1r.  On  the  other  hand, 
as  before,  under  H\  the  quadratic  form  in  /d(l  —  @l)  is  Pe  =  (l  —  Os)TM~l(r  —  Os) 
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SCR 

Pf 

ALOD 

GR 

LOD  (N=128) 

0  dB 

10~2 

Pd 

0.50 

0.06 

0.21 

-5  dB 

10~2 

Pd 

0.46 

0.04 

-10  dB 

10-2 

Pd 

0.40 

0.02 

0.23 

-15  dB 

lO"2 

Pd 

0.32 

0.01 

0.15 

-20  dB 

lO"2 

Pd 

0.22 

0.01 

0.08 

-25  dB 

10"2 

Pd 

0.17 

0.01 

-30  dB 

10"2 

Pd 

0.11 

0.01 

0  dB 

10~3 

Pd 

0.41 

0.003 

-5  dB 

10"3 

Pd 

0.39 

0.001 

-10  dB 

10"3 

Pd 

0.35 

0.001 

-15  dB 

10"3 

Pd 

0.27 

0.001 

-20  dB 

10"3 

Pd 

0.17 

0.001 

-25  dB 

10"3 

Pd 

0.13 

0.001 

-30  dB 

10"3 

Pd 

0.08 

0.001 

0  dB 

10~4 

Pd 

0.35 

0.0003 

-5  dB 

10“4 

Pd 

0.32 

0.0001 

-10  dB 

10'4* 

Pd 

0.28 

0.0001 

-15  dB 

10"4 

Pd 

0.23 

0.0001 

-20  dB 

10~4 

Pd 

0.16 

0.0001 

-25  dB 

10-4 

Pd 

0.09 

0.0001 

0  dB 

10"5 

Pd 

0.30 

10~5 

-5  dB 

10"5 

Pd 

0.28 

10"5 

-10  dB 

10“5 

Pd 

0.25 

10"5 

-15  dB 

10-5 

Pd 

0.18 

10"6 

-20  dB 

10"5 

Pd 

0.10 

10“5 

Table  5.1:  N=16,  /?  =  0.5,  SCR:Signal  to  Clutter  Ratio,  ALOD:Amplitude  Dependent 
Locally  Optimum  Detector,  GR:Gaussian,  LODrLocally  Optimum  Detector  Receiver 
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Table  5.2:  N=16,  /?  =  1.0,  SCR:Signal  to  Clutter  Ratio,  ALOD:Amplitude  Dependent 
Locally  Optimum  Detector,  GR:Gaussian  Receiver,  LOD:Locally  Optimum  Detector 


Table  5.3:  N=16,  /?  =  1.5,  SCR:Signal  to  Clutter  Ratio,  ALOD:Amplitude  Dependent 
Locally  Optimum  Detector,  GR:Gaussian  Receiver,  LOD:Locally  Optimum  Detector 
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SCR 

Pf 

ALOD 

GR 

LOD  (N=128) 

0  dB 

10-:i 

Pd 

0.36 

0.18 

0.14 

-5  dB 

10-2 

Pd 

0.26 

0.10 

-10  dB 

10~2 

Pd 

0.17 

0.03 

0.16 

-15  dB 

»— ‘ 
o 

1 

kJ 

Pd 

0.11 

0.01 

0.09 

0  dB 

10^ 

Pd 

0.18 

0.06 

-5  dB 

10"3 

Pd 

0.13 

0.009 

-10  dB 

10"3 

Pd 

0.06 

0.004 

Table  5.4:  N=16,  /?  =  3.0,  SCR:Signal  to  Clutter  Ratio,  ALOD:Amplitude  Dependent 
Locally  Optimum  Detector,  GR:Gaussian  Receiver,  LOD:Locally  Optimum  Detector 


where  the  subscript  9  is  used  to  emphasize  that  we  are  not  taking  the  limit  as  6 
approaches  zero.  Since  equation  (5.1)  is  a  ratio  test,  all  constants  can  be  placed  in 
the  threshold  which  is  determined  by  specifying  a  false  alarm  probability.  Therefore, 
the  multiplicative  constants  are  ignored  for  convenience.  Hence,  we  will  be  concerned 
only  with  terms  containing  the  vector  R.  Excluding  constants  the  numerator  in  the 
ratio  test  is  given  by 

dfpjr  -  6s)  =  8_. _ 1 _ 

.89  d9[(0-l+pe/2)N+91'  1  ' 

Applying  the  chain  rule,  as  was  done  with  the  K-distributed  disturbance  differentiat¬ 
ing  with  respect  to  9  and  excluding  the  constant,  the  numerator  in  the  ratio  test  is 
given  by 

d/c(r  -  Os)  U _  (  . 

89  (/?  - 1  +  pe/2YN+9+i) 

where  lg  =  2 sTM~1{r  —  9s).  From  the  above  equation  the  sufficient  statistic  for 
the  amplitude  dependent  locally  optimum  detector  for  the  multivariate  student-T 
distribution  can  be  written  as 


_  W~l+p/2)"+'> 

alodU  (/3  —  !  +  pfl /2)iv+0+1 ' 


(5.8) 


Notice  that  the  exponent  in  the  denominator  of  the  test  statistic  is  one  more  than 
that  in  the  numerator.  In  the  likelihood  ratio  test  the  exponents  in  the  numerator  and 
the  denominator  of  the  test  statistic  are  identical.  But  the  factor  1$  appearing  in  the 
ALOD  will  not  be  present.  The  computer  simulation  of  the  ALOD  for  the  multivariate 
student-T  distributed  disturbance  was  carried  out  as  mentioned  in  section  4.1.2.  The 
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results  of  the  computer  simulation  are  shown  in  Tables  5.5,  5.6  and  5.7. 

5.2.1  Conclusions 

As  was  the  case  for  the  K-distributed  disturbance,  performance  is  evaluated  only 
for  values  of  SCR  less  than  or  equal  to  0  dB.  Even  though  the  ALOD  may  outperform 
the  Gaussian  receiver  for  values  above  0  dB  of  SCR  up  to  a  certain  level,  the  decision 
rule  that  should  be  used  in  such  situations  is  the  optimum  one  based  on  the  likelihood 
ratio  test.  The  value  of  /?  =  1.5  is  chosen  because  it  represents  a  heavy  tail  situation. 
Three  different  sample  sizes  of  N=16,  32  and  64  were  chosen  for  simulation  purposes. 

It  is  seen  from  the  tables  that  the  ALOD  significantly  outperforms  the  Gaussian 
receiver  in  the  range  of  SCR  values  considered.  The  performance  improvement  of 
the  ALOD  over  the  Gaussian  receiver  is  similar  to  that  of  the  LOD  discussed  in 
Section  4.1.3.  Compared  to  the  Gaussian  receiver,  the  ALOD  has  about  an  order 
of  magnitude  improvement  in  performance  for  Pp  =  10-2,  two  orders  of  magnitude 
improvement  in  performance  for  Pf  =  10~3  and  three  orders  of  magnitude  improve¬ 
ment  in  performance  for  Pp  =  10-4.  The  performance  improvement  of  the  ALOD  for 
the  student-T  distribution  compared  to  that  of  the  LOD  is  not  as  dramatic  as  was 
noticed  for  the  K-distributed  case.  For  the  student-T  distribution  the  performance  of 
the  locally  optimum  detector  and  the  amplitude  dependent  locally  optimum  detector 
are  actually  quite  close.  The  ALOD  for  the  K-distribution  significantly  outperformed 
both  the  LOD  and  the  Gaussian  receiver.  The  ALOD  for  the  student-T  distribution, 
while  significantly  outperforming  the  Gaussian  receiver  in  the  range  of  SCR  values 
considered,  does  not  outperform  the  LOD  significantly.  In  fact,  comparing  the  detec¬ 
tion  probabilities  resulting  from  the  ALOD  with  that  of  the  detection  probabilities 
resulting  from  the  LOD,  it  is  noticed  that  the  performance  of  the  ALOD  exceeds  that 
of  the  LOD  by  less  than  a’tenth  for  SCR  values  less  than  0  dB. 

With  reference  to  equation  (5.8),  the  ALOD  can  be  factored  into  two  terms.  The 
first  term  is  lg.  The  behavior  of  lg  was  explained  in  Section  5.1.1.  The  output  of 
lg  and  the  test  statistic  increase  when  the  signal  is  present  as  opposed  to  the  signal 
absent  case.  The  second  term  is  a  ratio  of  two  polynomials.  In  general,  as  discussed 
in  Section  5.1.1,  pg  <  p  whether  or  not  the  signal  is  present.  Therefore,  in  general, 
the  ratio  of  the  two  polynomials  is  a  number  greater  than  unity.  It  is  seen  in  the 
simulations  that  this  ratio  has  a  higher  value  under  the  signal  present  hypothesis 


123 


SCR 

Pf 

ALOD 

GR 

LOD 

0  dB 

lO"* 

Pd 

0.48 

0.11 

0.38 

-3  dB 

10~2 

Pd 

0.36 

0.052 

0.32 

-6  dB 

10~2 

Pd 

0.30 

0.039 

0.27 

-9  dB 

10"2 

Pd 

0.19 

0.024 

0.15 

-10  dB 

io-2 

Pd 

0.12 

0.019 

0.10 

0  dB 

10"a 

Pd 

0.34 

0.003 

0.16 

-3  dB 

10"3 

Pd 

0.23 

0.002 

0.13 

-6  dB 

10~3 

Pd 

0.15 

0.001 

0.10 

-8  dB 

10"3 

Pd 

0.10 

0.001 

0.08 

Table  5.5:  N=16,  /?  =  1.5,  SCR:Signal  to  Clutter  Ratio,  ALOD:Amplitude  Dependent 
Locally  Optimum  Detector,  GR:Gaussian  Receiver,  LODrLocally  Optimum  Detector 


SCR 

Pf 

ALOD 

GR 

LOD 

0  dB 

IO"2 

Pd 

0.48 

0.18 

0.46 

-3  dB 

10"2 

Pd 

0.39 

0.08 

0.38 

-6  dB 

10"2 

Pd 

0.34 

0.06 

0.31 

-9  dB 

io-2 

Pd 

0.24 

0.033 

0.20 

-10  dB 

TO"2 

Pd 

0.15 

0.026 

0.13 

-12  dB 

10-2 

Pd 

0.11 

0.013 

0  dB 

10“3 

Pd 

0.37 

0.004 

0.26 

-3  dB 

10“2 

Pd 

0.25 

0.003 

0.19 

-6  dB 

10“3 

Pd 

0.16 

0.002 

0.14 

-8  dB 

IO"3 

Pd 

0.11 

0.002 

0.11 

Table  5.6:  N=32,  /?  =  1.5,  SCR:Signal  to  Clutter  Ratio,  ALOD:Amplitude  Dependent 
Locally  Optimum  Detector,  GR:Gaussian  Receiver,  LOD:Locally  Optimum  Detector 


more  often  than  it  does  under  the  signal  absent  hypothesis.  This  contributes  to  an 
increased  sensitivity  to  the  presence  of  weak  signals.  That  explains  why  the  ALOD 
outperforms  the  LOD.  The  increase  in  the  value  of  the  test  statistic  of  the  ALOD 
when  the  signal  is  present  is  not  restricted  to  weak  signal  situations  alone.  The 
LOD  also  has  two  factors  in  the  test  statistic.  However,  when  the  signal  is  present, 
one  factor  increases  and  the  other  decreases.  This  is  not  a  favorable  situation  for 
raising  the  output  of  the- test  statistic  over  the  threshold  for  small  signals.  The 
LOD’s  nonlinearity  for  the  student-T  distribution,  which  is  the  second  factor  in  the 
test  statistic,  is  not  as  highly  nonlinear  as  was  observed  for  the  K-distributed  case. 
Thus,  the  performance  improvement  of  the  ALOD  with  respect  to  the  LOD  is  not  as 
significant. 
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SCR 

Pf 

ALOD 

GR 

LOD 

0  dB 

IS 

Pd 

0.59 

0.30 

0.55 

-3  dB 

mi 

Pd 

0.52 

0.17 

0.48 

-6  dB 

Pd 

0.46 

0.10 

0.40 

-9  dB 

ii§ 

Pd 

0.40 

0.03 

-10  dB 

Pd 

0.16 

0.023 

0.14 

-12  dB 

Pd 

0.12 

0.012 

0.09 

0  dB 

iB 

Pd 

0.40 

0.005 

0.36 

-3  dB 

5 

Pd 

0.30 

0.002 

0.28 

-6  dB 

10"3 

Pd 

0.23 

0.002 

0.22 

-9  dB 

10"3 

Pd 

0.11 

0.001 

0.10 

0  dB 

Pd 

0.36 

0.0007 

0.25 

-3  dB 

Pd 

0.24 

0.0002 

0.19 

-6  dB 

Pd 

0.15 

0.0001 

0.13 

-8  dB 

ESI 

Pd 

0.12 

0.0001 

0.11 

Table  5.7:  N=64,  0  =  1.5,  SCR-.Signal  to  Clutter  Ratio,  ALOD: Amplitude  Dependent 
Locally  Optimum  Detector,  GR:Gaussian  Receiver,  LOD:Locally  Optimum  Detector 
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Chapter  6 
Conclusions 


6.1  Summary 

Conclusions  and  suggestions  for  future  work  are  presented  in  this  chapter.  We 
have  addressed  the  problem  of  weak  signal  detection  in  correlated  multivariate  non- 
Gaussian  noise.  Weak  signal  detectors  were  derived,  based  on  the  locally  optimum 
decision  rule,  using  the  concept  of  Spherically  Invariant  Random  Processes  (SIRP) 
for  modeling  the  radar  disturbance  .  Locally  Optimum  Detectors  (LOD)  are  useful 
only  in  the  neighborhood  of  the  point  for  which  they  are  evaluated.  For  non-Gaussian 
problems  the  test  statistic  derived  for  the  Locally  Optimum  Detector  are  nonlinear. 
Due  to  the  non-Gaussian  and  nonlinear  nature  of  the  problem,  thresholds  needed 
to  set  specified  false  alarm  probabilities  cannot  be  obtained  in  closed  form.  The 
Generalized  Pareto  Distribution  (GPD)  in  conjunction  with  the  method  of  Extreme 
Value  Theory  was  used  to  obtain  accurate  approximations  to  thresholds  for  spec¬ 
ified  false  alarm  probabilities.  This  was  achieved  with  orders  of  magnitude  fewer 
samples  compared  to  Monte  Carlo  simulation.  Performance  analyses  of  the  locally 
optimum  detectors  for  the  multivariate  K-distribution  and  the  student-T  distribu¬ 
tion  were  carried  out  by  means  of  computer  simulations.  Finally,  the  concept  of  the 
Amplitude  Dependent  Locally  Optimum  Detector  (ALOD)  was  introduced.  For  the 
K-distribution  the  ALOD  was  shown  to  greatly  outperform  the  LOD. 

The  following  significant  contributions  appear  in  this  dissertation: 

1.  Under  the  assumption  that  the  radar  clutter  can  be  modeled  as  a  SIRP  the 
canonical  model  for  the  Locally  Optimum  Detector  was  shown  to  be  the  product  of 
the  Gaussian  linear  receiver  and  a  zero  memory  nonlinearity. 
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2.  The  computational  requirements  needed  to  set  thresholds  for  very  small  false 
alarm  probabilities  were  reduced  by  orders  of  magnitude  using  the  GPD  in  conjunction 
with  the  method  of  Extreme  Value  Theory.  Accurate  thresholds  were  determined  by 
introducing  Ordered  Samples  Least  Squares  technique  to  estimate  the  parameters  of 
the  GPD.  For  example,  only  20,000  samples  were  required  for  different  distributions 
to  establish  thresholds  corresponding  to  a  false  alarm  probability  equal  to  10-6. 

3.  In  contrast  to  the  available  literature,  where  LODs  are  evaluated  based  on  the 
assumption  of  an  infinite  sample  size,  performance  of  the  LODs  were  obtained  for 
finite  sample  sizes. 

4.  The  new  concept  of  the  Amplitude  Dependent  Locally  Optimum  Detector 
(ALOD)  was  introduced. 

5.  Performance  of  the  Amplitude  Dependent  Locally  Optimum  Detector  was  eval¬ 
uated  for  finite  sample  sizes.  The  ALOD  had  a  significant  performance  improvement 
over  the  LOD  when  the  disturbance  was  K-distributed. 

As  a  result  of  the  above  contributions,  practical  decisions  can  be  made  with  respect 
to  use  of  LODs.  These  decisions  will  be  based  on  the  available  sample  size,  the  desired 
detection  and  false  alarm  probabilities  and  the  signal  to  clutter  ratio. 

6.2  Suggestions  for  Future  Research 

This  research  has  led  to  many  important  results  and  has  also  raised  a  number  of 
interesting  questions.  In  particular,  some  of  the  issues  arising  as  extensions  to  this 
work  are  to: 

1.  Compare  performance  of  the  LOD  and  the  ALOD  with  that  of  the  classical 
likelihood  ratio  test  for  a  broad  range  of  signal  to  clutter  ratios  and  sample  sizes.  This 
will  establish  the  conditions  under  which  there  is  a  need  for  weak  signal  detectors. 

2.  Analyze  performance  of  the  LOD  and  the  ALOD  for  other  multivariate  proba¬ 
bility  density  functions  in  the  SIRP  class,  such  as  Weibull  and  Rician. 

3.  Establish  confidence*  intervals  for  the  thresholds  estimated  based  on  Extreme 
Value  Theory. 

4.  Extend  all  of  the  above  work  to  Space-Time  processing. 

5.  Study  the  role  of  signal  design  in  enhancing  performance  of  the  LOD. 
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Appendix  A 

Issues  Related  to  Extreme  Value 
Theory 

A.l  Limiting  Forms  for  the  Largest  Order  Statistic 

Let  X\  <  X2  <  ...  <  Xn  be  the  ordered  statistics  of  n  random  variables  having 
a  common  distribution  function  F(x).  Assuming  that  the  trials  of  drawing  the  ran¬ 
dom  variables  from  the  distribution  function  F(x')  are  independent,  the  distribution 


function  of  the  largest  order  statistic  Xn  is  given  by 

F(Xn  <  x)  =  P{X1<x,X2<x,...,Xn<x) 

=  Fn(z).  (A.l) 

When  F  is  continuous  but  unknown,  an  asymptotic  theory  is  developed  for  F  in  the 
range  0+  to  1_  [45].  It  is  shown  that  positive  sequences  {an}  and  {in}  exist  such  that 

£>)=  Km  P(X .  <  «„*  +  K)  A(*)  (A.2) 

or  equivalently,  by  means  of  equation  (A.l),  that 

Jirn  Fn(anx  -f  bn)  -*  A(x).  (A.3) 

Let  n  =  md  in  equation  (A.3).  d  is  a  fixed  positive  constant  so  that  as  n  — *  oo, 
m  — *  oo.  Using  the  fact  that  n  =  md,  we  can  write 

Jim  Fmd(amdx  +  bmd )  =  lim  Fn(anx  +  bn)  ->  A(x).  (A.4) 

m—+  oo  n — ►oo  '  '  '  v  7 
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It  is  also  true  that 


lim  [Fm(amx  +  bm)]d  =  lim  Fmd{amx  +  bm)  -  Ad{x).  (A.5) 

m— +oo  m—>oc 

If  equations  (A.4)  and  (A.5)  hold,  then  from  a  theorem  of  Hintchin  [52],  there  exist 
numbers  Ad>  0  and  Bd  >  0  such  that 

A  d(Adx  +  Bd)  =  A(»)  (A.6) 


for  all  integer  values  of  d. . 

Solution  of  the  above  functional  equation  yields  all  the  possible  limiting  forms  for 
the  distribution  function  Fn(x).  The  constant  Ad  may  or  may  not  be  unity.  If  it  is 
unity,  then  the  functional  equation  to  be  solved  is  given  by 

Ad(x  +  Bd)  =  A(x).  (A.7) 

On  the  other  hand,  if  Ad  is  not  unity,  the  form  of  equation  (A.6)  stands  and  there 
exists  a  value  xod  —  Bd/{  1  —  Ad)  such  that 

Ad(xod)  =  A(xo<f).  (A.8) 

Constraining  the  solution  to  the  above  equation  to  be  real  and  nonnegative,  the 
solution  is  either  A  =  0  or  1.  However,  because  A(x)  is  a  distribution  function  the 
value  of  A  can  be  0  only  if  xod  is  the  lower  endpoint  at  which  A(x0d)  =  0+  and  A  can 
be  1  only  if  Xod  is  the  upper  end  at  which  A(xod)  =  1--  Since  Ad  and  Bd  are  assumed 
to  be  finite,  xod  must  also  be  finite.  Consequently,  there  is  no  loss  in  generality  by 
assuming  that  the  endpoint  of  interest  is  located  at  the  origin  (i.e.,  xod  =  0).  When 
Ad  ^  1,  note  that  x0 d  =  0  implies  Bd  =  0.  As  a  result,  the  solutions  for  equation 


(A.6)  fall  into  three 

cases  which  are  given  below. 

1) 

Ad(x  +  Bd)  =  A(x) 

Ad  —  1 

(A.9) 

2) 

A  d(Adx)  =  A(x) 

Ad  7^  1  F  =  0  when  x  =  0 

(A.10) 

3) 

A  d(Adx)  =  A(x) 

Ad  ^  1  F  =  1  when  x  =  0 

(A.ll) 
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A. 1.1  Case  1 


Case  (1)  of  equation  (A. 9)  is  solved  as  follows.  Taking  the  logarithm,  we  have 


log  A(x)  =  d  log  A(x  +  Bd). 


(A. 12) 


Multiplying  through  by  a  minus  sign  and  taking  the  logarithm  of  both  sides,  we  obtain 


log[-log  A(a;)]  =  log  d  +  log[-log  A(x  +  Bd )]. 


(A. 13) 


For  simplicity,  let 


g(x)  =  log[-log  A(z)]. 


Then  equation  (A. 13)  becomes 


(A.14) 


g(x)  =  log  d  +  g(x  +  Bd ). 


(A. 15) 


Equivalently, 


g(x  -  Bd)  =  log  d  +  g(x) 


(A.16) 


g(x)  =  g(x  -  Bd)  -  log  d. 
Adding  equations  (A. 15)  and  (A. 17),  we  obtain 


(A.17) 


g(x  +  Bd)  +  g(x  -  Bd)  =  2 g(x). 


(A.18) 


The  above  equation  is  valid  for  all  x  if  and  only  if  ^(x)  is  linear  in  x.  Specifically,  let 


y(x)  =  kx  +  j 


(A.19) 


where  j  and  k  are  constants.  Then 


g(x  +  Bd)  —  k(x  +  Bd)  +  j  =  g(x)  —  log  d  —  kx  +  j  —  log  d. 


(A.20) 


It  follows  that 


kBd  =  —log  d  or  k  = 


log  d 


(A.21) 
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Substituting  equation  (A.21)  in  equation  (A. 19),  we  see  that 


g(x)+*J^=j.  (A.22) 

Using  equation  (A. 14),  this  result  becomes 

log[-log  A(x)]  +  *  jj^--  =  j.  (A.23) 

Thus,  we  have 

log [  log  A(x)]  =  +  j.  (A.24) 

Hence,  for  case  (1)  of  equation  (A.9)  to  hold,  log[—log  A(x)j  must  be  linear  in  x. 

We  now  solve  for  the  sequence  { Bd }.  For  this  purpose,  let  d  =  pq  where  p  and  q 
are  both  integers.  Note  that 


Ap?(x  +  B„)  =  A(x).  (A.25) 

From  the  above  equation  we  get 

A(x  +  Bpq)  =  Ap5(x) 

=  [Ap(x)]«  =  [A(x  +  Bp) « 

=  A»(x  +  Bp)  =  A({x  +  Bp)  +  Bq)  =  \(x  +  Bp  +  Bq).  (A.26) 
Equation  (A.26)  implies  that 

Bpq  =  Bp  +  Bq.  (A.27) 

We  now  determine  the  functional  dependence  of  the  sequence  {Bd}  on  the  subscript 
d.  To  emphasize  this  functional  dependence,  we  rewrite  equation  (A.27)  as 

B(pq)  =  B(p)  +  B(q).  (A.28) 

From  the  above  equation,  it  is  clear  that  the  functional  dependence  is  logarithmic. 
Thus,  the  solution  for  Bd  is  given  by 

B(d )  =  Bd  =  log  d  (A.29) 
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Substituting  equation  (A.29)  into  equation  (A.24)  yields 

log[-log  A(x)]  =  -x  +  j 

(A. 30) 

where  j  plays  the  role  of  a  location  parameter.  Hence,  without  loss  of  generality,  j  is 
chosen  to  be  zero.  The  above  equation  then  simplifies  to 

log[-Iog  A(x)]  =  -x. 

(A.31) 

Solution  for  A(x)  results  in 

A(x)  =  exp(—e~x). 

(A.32) 

Equation  (A. 32)  is  the  solution  of  equation  (A.9)  for  case  1. 

A. 1.2  Cases  2  and  3 

The  solutions  to  Cases  (*2)  and  (3)  of  equation  (A. 10)  and  (A. 11)  are  now  derived. 

In  both  cases  we  have 

Ad(Adx)  =  A(x). 

(A.33) 

From  equation  (A. 33)  we  get 

log  A(x)  =  d  log  A (Adx). 

(A.34) 

Multiplying  through  by  a  minus  sign  and  taking  the  logarithm  of  both  sides, 

we  obtain 

log[-log  A(x)]  =  log  d  +  log[-log  A {Adx)\. 

(A.35) 

As  in  case  1,  let 

g{x)  =  log[-log  A(x)]. 

(A.36) 

Then  equation  (A.35)  becomes 

g(x)  =  log  d  +  g(Adx). 

(A.37) 

Alternatively, 

X 

=  l°9d  +  g{x) 

(A.38) 
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(A.39) 

(A.40) 


(A.41) 

and 

g(x)  =  ±fc  log  (— x)  for  x  <  0  (A.42) 

where  k  is  a  positive  constant.  Use  of  equation  (A. 36)  in  equations  (A.41)  and 
(A.42)  yields 

log[— log  A(x)]  =  log  x  for  x  >  0  (A.43) 

log[-log  A(x)]  =  ±k  log  (-x)  for  x  <  0.  (A.44) 

For  Case  2,  A  =  0  when  x  =  0.  This  implies  x  =  0  is  the  lower  end  point  of  A(x). 
Hence,  A(x)  is  nonzero  for  x  >  0.  Therefore,  our  solution  is  given  by  equation  (A.43) 
where  we  must  choose  the  sign  in  front  of  k  to  be  negative.  Then 

log[— log  A(x)]  =  —  k  log  x  x  >  0  (A.45) 

which  results  in 

A(x)  =  exp(—x~k)  x  >  0.  (A.46) 

For  case  3,  A  =  1  when  x  =  0.  This  implies  that  x  =  0  is  the  upper  endpoint 
of  A(x).  Hence,  A(x)  is  nonzero  for  x  <  0.  Consequently,  the  solution  is  given  by 
equation  (A.44)  where  we  choose  the  sign  in  front  of  k  to  be  positive.  Then 

log[-log  A(x)]  =  k  log(-x)  x  <  0  (A.47) 

resulting  in 

A(x)  =  exp(—(—x)k)  x  <  0.  (A.48) 
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or  equivalently, 

X 

g(x)  =  ~iog  d  +  g(—). 

Adding  equations  (A. 37)  and  (A.39)  results  in 

g(Adx)  +  g{^~)  =  2g(x). 
The  solution  to  the  above  equation  is 

<?(x)  =  ±k  log  x  for  x  >  0 


Thus,  the  three  possible  forms  for  the  limiting  distribution  A(x)  that  arise  as  solu¬ 
tions  to  equation  1  are  given  as  follows: 


1) 

A(x)  =  exp(—e~x) 

(A.49) 

2) 

A(x)  =  exp(—x~k)  x  >  0  ,  k  >  0 

(A.50) 

3) 

A(x)  =  exp(—(—x)k)  x  <  0  ,  k  >  0. 

(A.51) 

A. 2  Tails  of  Probability  Density  Functions 

Equations  (A.49-A.51)  represent  the  the  three  possible  limiting  forms  of  the  distri¬ 
bution  function  for  almost  all  smooth  and  continuous  probability  density  functions. 
By  differentiating  the  three  functions,  we  obtain  analytical  expressions  for  the  limiting 
forms  of  the  probability  density  functions.  However,  because  of  the  differentiation,  it 
should  be  recognized  that  these  expressions  may  not  be  good  approximations  to  the 
density  functions.  In  practice,  extreme  value  theory  should  always  be  applied  to  a 

distribution  function,  or  equivalently,  the  area  under  the  density  function. 

A. 2.1  Case  1 

The  derivative  of  A(x)  is  given  by 

H(x )  =  —A(x)  =  exp(—e~x).(—e~x)(— 1)  =  e~xexp(—e~x)  =  exp(—x  —  e~x ). 

(A. 52) 

In  our  application  we  are  interested  in  the  right  tail  of  the  probability  density  function. 
Since  we  have  to  set  thresholds  corresponding  to  small  false  alarm  probabilities,  the 
thresholds  will  be  in  the  right  tail  of  the  probability  density  function.  When  x  is  very 
large,  x  e~x .  Therefore,  equation  (A.52)  can  be  simplified  to  obtain  the  PDF  of 
the  tail  as 

H(x)  =  e~x  x  large.  (A.53) 

A. 2. 2  Case  2 

The  derivative  of  A(x)  is  given  by 

H(x)  =  —A(x)  —  exp(—x~k).(kx~k~1) 
ax 

=  k  ezp(-x- V*'1),os  x)  =  k  exp(-x~k  -  (k  +  1  )log  x).  (A.54) 
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When  x  is  very  large  log  x  >  x  k.  Therefore,  equation  (A. 54)  can  be  simplified  to 
obtain  the  PDF  of  the  tail  as 

H(x)  =  ke~(k+1V°9X  =  kx~(k+V  x  >  0,x  large  k  >  0.  (A.55) 

A.2.3  Case  3 

The  derivative  of  A(x)  for  this  case  is  given  by 

H(x)  =  ^A(x)  =  exp(—(—x)k).(k(—x)k~1) 

=  k  exp(-(-x)  V*"1),os  ("x)  =  k  exp(-(-x)k  +  (k  -  1  )log  x).  (A.56) 

When  —x  is  very  large,  (— x)k  log  x.  Therefore,  equation  (A.56)  can  be  simplified 

to  obtain  the  PDF  of  the  tail  as 

H(x)  =  ke~(-x'*k  x  <  0,  — x  large  k>  0.  (A. 57) 

A  basic  assumption  in  the  above  development  is  that  successive  trials  are  indepen¬ 
dent.  This  led  to  equation  (A.l).  In  practice,  as  n  becomes  large,  it  may  be  difficult 
to  ensure  the  independence  of  successive  trials.  To  the  extent  that  the  assumption 
holds,  the  results  in  equations  (A.49-A.51)  are  valid. 

A. 3  PDF  of  the  rth  Order  Statistic 

Suppose  that  the  ordered  samples  Xi  <  X-i  <  ...  <  Xn  are  drawn  from  the  distri¬ 
bution  function  F(x).  Let  us  further  assume  that  the  trials  used  to  draw  the  samples 
from  the  distribution  are  independent.  Consider  the  rth  order  statistic  Xr.  Recall 
that  P(Xr  <  x)  is  the  distribution  function  of  Xr.  This,  in  turn,  is  the  probability 
that  at  least  r  of  the  X-s  are  less  than  or  equal  to  x.  Treating  this  as  a  Binomial 
problem,  the  distribution  function  is 

FXr(x)  =  P(X,  <*)  =  £  -  F(x)r'  (A.58) 

where  the  ith  term  in  the  summation  is  the  binomial  probability  that  exactly  i  of 
X\,  X2, ...,  Xn  are  less  than  or  equal  to  x.  Equation  (A.58)  can  also  be  represented  in 
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the  form  of  an  integral, 


^(*>-(rzi  )&-r)ijr)f-(i-ty~« 


(A.59) 


which  can  be  verified  by  using  integration  by  parts  in  equation  (A.59).  The  probability 
density  function  of  the  rth  order  statistic  is  the  derivative  of  FXr(x)  and  is  given  by 


fxXx)  = -FXr{x)  = 

=  (r-ri)l(n  _  r)! *”*(»)[»  ~  n*))~rf(*)  (A.60) 


where  f(x)  =  ^F(x).  Equation  (A.60)  represents  the  general  form  of  the  PDF  of 
the  rth  order  statistic.  If  F(x)  is  known,  then  the  mean  and  the  variance  of  the  rth 
order  statistic  can  be  calculated.  The  expected  value  of  Xr  is  given  by 

E(Xr)  =  (r  -  Bit  -  r)i  II  zF"'W  ~  n*T-’f(*)dx.  (A.61) 

An  alternate  form  for  the  expected  value  of  XT  can  be  obtained  by  letting  u  =  F(x). 
Therefore,  x  =  F-1(u).  The  infinite  limits  of  the  integral  in  the  above  equation  then 
become  finite  after  the  transformation.  The  transformed  integral  is 

E(Xr)  =  (r-l)l(n-r)l  ll  ~  «)"-'**•  (A.62) 

The  variance  of  the  rth  order  statistic  is  expressed  as 


Var(Xr)  =  E[(Xr  -  E(Xr))*}  =  E(X?)  -  E\Xr ).  (A.63) 

Making  use  of  equation  (A.60),  E(X?)  can  be  written  as  follows. 

£(*?)  =  (r  _  !)”(„  _  r),  jT  -  F(*)]~ 7(*)<l*.  (A.64) 

An  alternate  form  for  the  expected  value  of  XT  can  be  obtained  by  again  letting 
u  =  F(x).  We  then  get 

E W>  =  (r  _!)?(„.  r)|  -  «)~*u  (A-65) 
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The  variance  of  Xr  can  be  calculated  from  equations  (A. 62)  and  (A. 65)  when  F  (u) 
is  known. 
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